boltzmann-policy-distribution

Name	boltzmann-policy-distribution JSON
Version	0.0.6 JSON
	download
home_page	https://github.com/cassidylaidlaw/boltzmann-policy-distribution
Summary	Code for the ICLR 2022 paper "The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models"
upload_time	2023-03-22 21:05:52
maintainer
docs_url	None
author	Cassidy Laidlaw
requires_python
license	MIT
keywords	human-robot interaction machine learning reinforcement learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # The Boltzmann Policy Distribution

This repository contains code and data for the ICLR 2022 paper [The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models](https://openreview.net/forum?id=_l_QjPGN5ye). In particular, the repository contains an implementation of our algorithm for computing the Boltzmann Policy Distribution (BPD) which is based around [RLlib](https://www.ray.io/rllib).

## Installation

The code can be downloaded as this GitHub repository or installed as a pip package.

### As a repository


1. Install [Python](https://www.python.org/) 3.8 or later (3.7 might work but may not be able to load pretrained checkpoints).
2. Clone the repository:

        git clone https://github.com/cassidylaidlaw/boltzmann-policy-distribution.git
        cd boltzmann-policy-distribution

2. Install pip requirements:

        pip install -r requirements.txt

### As a package

1. Install [Python 3](https://www.python.org/).
2. Install from PyPI:
    
        pip install boltzmann-policy-distribution

2. Import the package as follows:

        from bpd.agents.bpd_trainer import BPDTrainer

   See [getting_started.ipynb](getting_started.ipynb) or the Colab notebook below for examples of how to use the package.

## Data and Pretrained Models

Download human-human data from [here](https://boltzmann-policy-distribution.s3.us-east-2.amazonaws.com/human_data.zip).

Download pretrained models from [here](https://boltzmann-policy-distribution.s3.us-east-2.amazonaws.com/checkpoints.zip). The download includes a README describing which checkpoints are used where in the paper.

## Usage

This section explains how to get started with using the code and how to run the Overcooked experiments from the paper.

### Getting Started

The [getting_started.ipynb](getting_started.ipynb) notebook shows how to use the BPD to predict human behavior in a new environment. It is also available on Google Colab via the link below.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cassidylaidlaw/boltzmann-policy-distribution/blob/main/getting_started.ipynb)

### Experiments

Each of the subsections below describes how to various experiments from the paper. All experiment configuration is done using [Sacred](https://sacred.readthedocs.io/en/stable/), and parameters can be updated from the command line by adding `param=value` after the command. For instance, most of the experiments require setting the Overcooked layout by, for instance, writing `layout_name="cramped_room"`.

We used [RLlib](https://www.ray.io/rllib) for reinforcement learning (RL) and many experiments output an RLlib checkpoint as the result. If a checkpoint from one experiment is needed for another experiment, you can find the checkpoint by looking at the output of the training run, which should look something like this:

    INFO - main - Starting training iteration 0
    INFO - main - Starting training iteration 1
    ...
    INFO - main - Saved final checkpoint to data/logs/self_play/ppo/cramped_room/2022-01-01_12-00-00/checkpoint_000500/checkpoint-500

Many experiments also log metrics to TensorBoard during training. Logs and checkpoints are stored in `data/logs` by default. You can open TensorBoard by running

    pip install tensorboard
    tensorboard --logdir data/logs

#### Calculating the BPD

To calculate the BPD for Overcooked, we used the following command:

    python -m bpd.experiments.train_overcooked with run="bpd" num_workers=25 num_training_iters=2000 layout_name="cramped_room" temperature=0.1 prior_concentration=0.2 reward_shaping_horizon=20000000 latents_per_iteration=250  share_dense_reward=True train_batch_size=100000 discriminate_sequences=True max_seq_len=10 entropy_coeff_start=0 entropy_coeff_end=0 latent_size=1000 sgd_minibatch_size=8000 use_latent_attention=True

Some useful parameters include

 * `temperature`: the parameter $1 / \beta$ from the paper, which controls how irrational or suboptimal the human is.
 * `prior_concentration`: the parameter $\alpha$ from the paper, which controls how inconsistent the human is.
 * `latent_size`: $n$, the size of the Gaussian latent vector $z$.

#### Training a predictive model for the BPD

In the paper, we describe training a sequence model (transformer) to do online prediction of human actions using the BPD. We also experimented with using an RNN, and the command to train either is as follows. To train a prediction model, the first step is to rollout many episodes from the BPD:

    python -m bpd.experiments.rollout with checkpoint=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 run=bpd num_workers=10 episodes=5000

Replace the `checkpoint=` parameter with the path to your BPD checkpoint. Then, look for a directory called `rollouts_2022-...` under the checkpoint directory. Use this to run the sequence model training:

    python -m bpd.experiments.train_overcooked with run="distill" num_training_iters=5000 distill_random_policies=True layout_name="cramped_room" use_sequence_model=True use_lstm=False train_batch_size=16000 sgd_minibatch_size=16000 num_sgd_iter=1 size_hidden_layers=256 input="data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/rollouts_2022-01-01_12-00-00" save_freq=1000

You can set `use_lstm=True` to use an LSTM instead of a transformer for prediction.

#### Evaluating prediction

We haven't used any human data up until now to train the BPD and the predictive model! However, to evaluate the predictive power of the BPD, we'll need the human trajectories included in data download above. Assuming you've extracted them to `data/human_data`, you can run:

    python -m bpd.experiments.evaluate_overcooked_prediction with checkpoint_path=data/checkpoints/cramped_room/bpd_0.1_0.2_1000_transformer/checkpoint_005000/checkpoint-5000 run=distill human_data_fname="data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl" out_tag="test"

You should replace the `run=distill` parameter with whatever `run` parameter you used to **train** the model you want to evaluate. For instance, to evaluate the BPD policy distribution directly using mean-field variational inference (MFVI), you could run

    python -m bpd.experiments.evaluate_overcooked_prediction with checkpoint_path=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 run=bpd human_data_fname="data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl" out_tag="test"

#### Training a best response

Besides using the BPD to predict human actions, we might also want to use it to enable human-AI cooperation. We can do this by training a *best response* to the BPD which will learn to cooperate with all the policies in the BPD and thus hopefully with real humans as well. To train a best response, run:

    python -m bpd.experiments.train_overcooked with run="ppo" num_workers=10 num_training_iters=500 multiagent_mode="cross_play" checkpoint_to_load_policies=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 layout_name=cramped_room evaluation_interval=None entropy_coeff_start=0 entropy_coeff_end=0 share_dense_reward=True train_batch_size=100000 sgd_minibatch_size=8000

You can replace the `checkpoint_to_load_policies` parameter with any other checkpoint you want to train a best response to. For instance, [human-aware RL](https://github.com/HumanCompatibleAI/human_aware_rl) (HARL) is just a best response to a behavior cloned (BC) policy. To train a HARL policy, you can follow the instructions below to train a BC policy and then use that checkpoint with the command above.

#### Training a behavior cloning/human proxy policy

To train a behavior-cloned (BC) human policy from the human data, run:

    python -m bpd.experiments.train_overcooked_bc with layout_name="cramped_room" human_data_fname="data/human_data/human_data_state_dict_and_action_by_traj_train_inserted_fixed.pkl" save_freq=10 num_training_iters=100 validation_prop=0.1

By default, this will use special, hand-engineered features as the input to the policy network. To use the normal Overcooked features add `use_bc_features=False` to the command. To train a BC policy on the test set, replace `human_data_fname="data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl"` in the command.

#### Evaluating with a human proxy

We evaluated cooperative AI policies in the paper by testing how well they performed when paired with a human proxy policy trained via behavior cloning on the test set of human data. To test a best response policy, run:

    python -m bpd.experiments.evaluate_overcooked with layout_name=cramped_room run_0=ppo checkpoint_path_0=data/checkpoints/cramped_room/bpd_0.1_0.2_1000_br/checkpoint_002000/checkpoint-2000 policy_id_0=ppo_0 run_1=bc checkpoint_path_1=data/checkpoints/cramped_room/bc_test/checkpoint_000500/checkpoint-500 num_games=100 evaluate_flipped=True ep_length=400 out_tag=hproxy

If you want to test a policy which *isn't* a best response with the human proxy, remove the `policy_id_0=ppo_0` parameter and update the `run_0` parameter to whatever `run` parameter you used when training the policy.

#### Baselines

To train a **self-play policy**, run:

    python -m bpd.experiments.train_overcooked with run="ppo" num_workers=10 num_training_iters=500 layout_name="cramped_room" prior_concentration=1 reward_shaping_horizon=20000000 share_dense_reward=True train_batch_size=100000 entropy_coeff_start=0 entropy_coeff_end=0 sgd_minibatch_size=8000

To train a **Boltzmann rational policy**, use the same command but change the parameters `entropy_coeff_start=0.1 entropy_coeff_end=0.1` for $1 / \beta = 0.1$.

To train a human model using **generative adversarial imitation learning (GAIL)**, run:

    python -m bpd.experiments.train_overcooked with run="gail" num_workers=10 num_training_iters=500 layout_name=cramped_room prior_concentration=1 reward_shaping_horizon=20000000 share_dense_reward=True train_batch_size=100000 num_sgd_iter=1 entropy_coeff_start=0.1 entropy_coeff_end=0.1 human_data_fname="data/human_data/human_data_state_dict_and_action_by_traj_train_inserted_fixed.pkl" sgd_minibatch_size=8000

## Citation

If you find this repository useful for your research, please cite our paper as follows:

    @inproceedings{laidlaw2022boltzmann,
      title={The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models},
      author={Laidlaw, Cassidy and Dragan, Anca},
      booktitle={ICLR},
      year={2022}
    }

## Contact

For questions about the paper or code, please contact cassidy_laidlaw@berkeley.edu.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cassidylaidlaw/boltzmann-policy-distribution",
    "name": "boltzmann-policy-distribution",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "human-robot interaction,machine learning,reinforcement learning",
    "author": "Cassidy Laidlaw",
    "author_email": "cassidy_laidlaw@berkeley.edu",
    "download_url": "https://files.pythonhosted.org/packages/51/ac/40ea1424349a98a396ce4dffd58590a45575c9bd7eae04aff29a2ef0f7ff/boltzmann-policy-distribution-0.0.6.tar.gz",
    "platform": null,
    "description": "# The Boltzmann Policy Distribution\n\nThis repository contains code and data for the ICLR 2022 paper [The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models](https://openreview.net/forum?id=_l_QjPGN5ye). In particular, the repository contains an implementation of our algorithm for computing the Boltzmann Policy Distribution (BPD) which is based around [RLlib](https://www.ray.io/rllib).\n\n## Installation\n\nThe code can be downloaded as this GitHub repository or installed as a pip package.\n\n### As a repository\n\n\n1. Install [Python](https://www.python.org/) 3.8 or later (3.7 might work but may not be able to load pretrained checkpoints).\n2. Clone the repository:\n\n        git clone https://github.com/cassidylaidlaw/boltzmann-policy-distribution.git\n        cd boltzmann-policy-distribution\n\n2. Install pip requirements:\n\n        pip install -r requirements.txt\n\n### As a package\n\n1. Install [Python 3](https://www.python.org/).\n2. Install from PyPI:\n    \n        pip install boltzmann-policy-distribution\n\n2. Import the package as follows:\n\n        from bpd.agents.bpd_trainer import BPDTrainer\n\n   See [getting_started.ipynb](getting_started.ipynb) or the Colab notebook below for examples of how to use the package.\n\n## Data and Pretrained Models\n\nDownload human-human data from [here](https://boltzmann-policy-distribution.s3.us-east-2.amazonaws.com/human_data.zip).\n\nDownload pretrained models from [here](https://boltzmann-policy-distribution.s3.us-east-2.amazonaws.com/checkpoints.zip). The download includes a README describing which checkpoints are used where in the paper.\n\n## Usage\n\nThis section explains how to get started with using the code and how to run the Overcooked experiments from the paper.\n\n### Getting Started\n\nThe [getting_started.ipynb](getting_started.ipynb) notebook shows how to use the BPD to predict human behavior in a new environment. It is also available on Google Colab via the link below.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cassidylaidlaw/boltzmann-policy-distribution/blob/main/getting_started.ipynb)\n\n### Experiments\n\nEach of the subsections below describes how to various experiments from the paper. All experiment configuration is done using [Sacred](https://sacred.readthedocs.io/en/stable/), and parameters can be updated from the command line by adding `param=value` after the command. For instance, most of the experiments require setting the Overcooked layout by, for instance, writing `layout_name=\"cramped_room\"`.\n\nWe used [RLlib](https://www.ray.io/rllib) for reinforcement learning (RL) and many experiments output an RLlib checkpoint as the result. If a checkpoint from one experiment is needed for another experiment, you can find the checkpoint by looking at the output of the training run, which should look something like this:\n\n    INFO - main - Starting training iteration 0\n    INFO - main - Starting training iteration 1\n    ...\n    INFO - main - Saved final checkpoint to data/logs/self_play/ppo/cramped_room/2022-01-01_12-00-00/checkpoint_000500/checkpoint-500\n\nMany experiments also log metrics to TensorBoard during training. Logs and checkpoints are stored in `data/logs` by default. You can open TensorBoard by running\n\n    pip install tensorboard\n    tensorboard --logdir data/logs\n\n#### Calculating the BPD\n\nTo calculate the BPD for Overcooked, we used the following command:\n\n    python -m bpd.experiments.train_overcooked with run=\"bpd\" num_workers=25 num_training_iters=2000 layout_name=\"cramped_room\" temperature=0.1 prior_concentration=0.2 reward_shaping_horizon=20000000 latents_per_iteration=250  share_dense_reward=True train_batch_size=100000 discriminate_sequences=True max_seq_len=10 entropy_coeff_start=0 entropy_coeff_end=0 latent_size=1000 sgd_minibatch_size=8000 use_latent_attention=True\n\nSome useful parameters include\n\n * `temperature`: the parameter $1 / \\beta$ from the paper, which controls how irrational or suboptimal the human is.\n * `prior_concentration`: the parameter $\\alpha$ from the paper, which controls how inconsistent the human is.\n * `latent_size`: $n$, the size of the Gaussian latent vector $z$.\n\n#### Training a predictive model for the BPD\n\nIn the paper, we describe training a sequence model (transformer) to do online prediction of human actions using the BPD. We also experimented with using an RNN, and the command to train either is as follows. To train a prediction model, the first step is to rollout many episodes from the BPD:\n\n    python -m bpd.experiments.rollout with checkpoint=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 run=bpd num_workers=10 episodes=5000\n\nReplace the `checkpoint=` parameter with the path to your BPD checkpoint. Then, look for a directory called `rollouts_2022-...` under the checkpoint directory. Use this to run the sequence model training:\n\n    python -m bpd.experiments.train_overcooked with run=\"distill\" num_training_iters=5000 distill_random_policies=True layout_name=\"cramped_room\" use_sequence_model=True use_lstm=False train_batch_size=16000 sgd_minibatch_size=16000 num_sgd_iter=1 size_hidden_layers=256 input=\"data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/rollouts_2022-01-01_12-00-00\" save_freq=1000\n\nYou can set `use_lstm=True` to use an LSTM instead of a transformer for prediction.\n\n#### Evaluating prediction\n\nWe haven't used any human data up until now to train the BPD and the predictive model! However, to evaluate the predictive power of the BPD, we'll need the human trajectories included in data download above. Assuming you've extracted them to `data/human_data`, you can run:\n\n    python -m bpd.experiments.evaluate_overcooked_prediction with checkpoint_path=data/checkpoints/cramped_room/bpd_0.1_0.2_1000_transformer/checkpoint_005000/checkpoint-5000 run=distill human_data_fname=\"data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl\" out_tag=\"test\"\n\nYou should replace the `run=distill` parameter with whatever `run` parameter you used to **train** the model you want to evaluate. For instance, to evaluate the BPD policy distribution directly using mean-field variational inference (MFVI), you could run\n\n    python -m bpd.experiments.evaluate_overcooked_prediction with checkpoint_path=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 run=bpd human_data_fname=\"data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl\" out_tag=\"test\"\n\n#### Training a best response\n\nBesides using the BPD to predict human actions, we might also want to use it to enable human-AI cooperation. We can do this by training a *best response* to the BPD which will learn to cooperate with all the policies in the BPD and thus hopefully with real humans as well. To train a best response, run:\n\n    python -m bpd.experiments.train_overcooked with run=\"ppo\" num_workers=10 num_training_iters=500 multiagent_mode=\"cross_play\" checkpoint_to_load_policies=data/checkpoints/cramped_room/bpd_0.1_0.2_1000/checkpoint_000500/checkpoint-500 layout_name=cramped_room evaluation_interval=None entropy_coeff_start=0 entropy_coeff_end=0 share_dense_reward=True train_batch_size=100000 sgd_minibatch_size=8000\n\nYou can replace the `checkpoint_to_load_policies` parameter with any other checkpoint you want to train a best response to. For instance, [human-aware RL](https://github.com/HumanCompatibleAI/human_aware_rl) (HARL) is just a best response to a behavior cloned (BC) policy. To train a HARL policy, you can follow the instructions below to train a BC policy and then use that checkpoint with the command above.\n\n#### Training a behavior cloning/human proxy policy\n\nTo train a behavior-cloned (BC) human policy from the human data, run:\n\n    python -m bpd.experiments.train_overcooked_bc with layout_name=\"cramped_room\" human_data_fname=\"data/human_data/human_data_state_dict_and_action_by_traj_train_inserted_fixed.pkl\" save_freq=10 num_training_iters=100 validation_prop=0.1\n\nBy default, this will use special, hand-engineered features as the input to the policy network. To use the normal Overcooked features add `use_bc_features=False` to the command. To train a BC policy on the test set, replace `human_data_fname=\"data/human_data/human_data_state_dict_and_action_by_traj_test_inserted_fixed.pkl\"` in the command.\n\n#### Evaluating with a human proxy\n\nWe evaluated cooperative AI policies in the paper by testing how well they performed when paired with a human proxy policy trained via behavior cloning on the test set of human data. To test a best response policy, run:\n\n    python -m bpd.experiments.evaluate_overcooked with layout_name=cramped_room run_0=ppo checkpoint_path_0=data/checkpoints/cramped_room/bpd_0.1_0.2_1000_br/checkpoint_002000/checkpoint-2000 policy_id_0=ppo_0 run_1=bc checkpoint_path_1=data/checkpoints/cramped_room/bc_test/checkpoint_000500/checkpoint-500 num_games=100 evaluate_flipped=True ep_length=400 out_tag=hproxy\n\nIf you want to test a policy which *isn't* a best response with the human proxy, remove the `policy_id_0=ppo_0` parameter and update the `run_0` parameter to whatever `run` parameter you used when training the policy.\n\n#### Baselines\n\nTo train a **self-play policy**, run:\n\n    python -m bpd.experiments.train_overcooked with run=\"ppo\" num_workers=10 num_training_iters=500 layout_name=\"cramped_room\" prior_concentration=1 reward_shaping_horizon=20000000 share_dense_reward=True train_batch_size=100000 entropy_coeff_start=0 entropy_coeff_end=0 sgd_minibatch_size=8000\n\nTo train a **Boltzmann rational policy**, use the same command but change the parameters `entropy_coeff_start=0.1 entropy_coeff_end=0.1` for $1 / \\beta = 0.1$.\n\nTo train a human model using **generative adversarial imitation learning (GAIL)**, run:\n\n    python -m bpd.experiments.train_overcooked with run=\"gail\" num_workers=10 num_training_iters=500 layout_name=cramped_room prior_concentration=1 reward_shaping_horizon=20000000 share_dense_reward=True train_batch_size=100000 num_sgd_iter=1 entropy_coeff_start=0.1 entropy_coeff_end=0.1 human_data_fname=\"data/human_data/human_data_state_dict_and_action_by_traj_train_inserted_fixed.pkl\" sgd_minibatch_size=8000\n\n## Citation\n\nIf you find this repository useful for your research, please cite our paper as follows:\n\n    @inproceedings{laidlaw2022boltzmann,\n      title={The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models},\n      author={Laidlaw, Cassidy and Dragan, Anca},\n      booktitle={ICLR},\n      year={2022}\n    }\n\n## Contact\n\nFor questions about the paper or code, please contact cassidy_laidlaw@berkeley.edu.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Code for the ICLR 2022 paper \"The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models\"",
    "version": "0.0.6",
    "split_keywords": [
        "human-robot interaction",
        "machine learning",
        "reinforcement learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e9f9d7533f3c39123c1fd686e071c8889efa889544e4d4b6d56cd969b867ddb1",
                "md5": "9487d0120c4652595fd7047f2cb69410",
                "sha256": "7b9945ffef55e53d63d26d835d589b163dc7521ff3c02bfd6535a6c4e52c87cb"
            },
            "downloads": -1,
            "filename": "boltzmann_policy_distribution-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9487d0120c4652595fd7047f2cb69410",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 54347,
            "upload_time": "2023-03-22T21:05:51",
            "upload_time_iso_8601": "2023-03-22T21:05:51.034449Z",
            "url": "https://files.pythonhosted.org/packages/e9/f9/d7533f3c39123c1fd686e071c8889efa889544e4d4b6d56cd969b867ddb1/boltzmann_policy_distribution-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "51ac40ea1424349a98a396ce4dffd58590a45575c9bd7eae04aff29a2ef0f7ff",
                "md5": "69b60433e7c684d3c34bd38d33fe9605",
                "sha256": "e3b9a9ac0de55bd7dbd79e7e2b1a0a5fe5f4cc7b396a28a38efac50ef163e9e8"
            },
            "downloads": -1,
            "filename": "boltzmann-policy-distribution-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "69b60433e7c684d3c34bd38d33fe9605",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 48611,
            "upload_time": "2023-03-22T21:05:52",
            "upload_time_iso_8601": "2023-03-22T21:05:52.171943Z",
            "url": "https://files.pythonhosted.org/packages/51/ac/40ea1424349a98a396ce4dffd58590a45575c9bd7eae04aff29a2ef0f7ff/boltzmann-policy-distribution-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-22 21:05:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "cassidylaidlaw",
    "github_project": "boltzmann-policy-distribution",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "boltzmann-policy-distribution"
}

Cassidy Laidlaw