# Reverb
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dm-reverb)
[![PyPI version](https://badge.fury.io/py/dm-reverb.svg)](https://badge.fury.io/py/dm-reverb)
Reverb is an efficient and easy-to-use data storage and transport system
designed for machine learning research. Reverb is primarily used as an
experience replay system for distributed reinforcement learning algorithms but
the system also supports multiple data structure representations such as FIFO,
LIFO, and priority queues.
## Table of Contents
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Detailed Overview](#detailed-overview)
- [Tables](#tables)
- [Item Selection Strategies](#item-selection-strategies)
- [Rate Limiting](#rate-limiting)
- [Sharding](#sharding)
- [Checkpointing](#checkpointing)
- [Citation](#citation)
## Installation
Please keep in mind that Reverb is not hardened for production use, and while we
do our best to keep things in working order, things may break or segfault.
> :warning: Reverb currently only supports Linux based OSes.
The recommended way to install Reverb is with `pip`. We also provide instructions
to build from source using the same docker images we use for releases.
TensorFlow can be installed separately or as part of the `pip` install.
Installing TensorFlow as part of the install ensures compatibility.
```shell
$ pip install dm-reverb[tensorflow]
# Without Tensorflow install and version dependency check.
$ pip install dm-reverb
```
### Nightly builds
[![PyPI version](https://badge.fury.io/py/dm-reverb-nightly.svg)](https://badge.fury.io/py/dm-reverb-nightly)
```shell
$ pip install dm-reverb-nightly[tensorflow]
# Without Tensorflow install and version dependency check.
$ pip install dm-reverb-nightly
```
### Debug builds
Starting with version 0.6.0, debug builds of Reverb are uploaded to Google Cloud
Storage. The builds can be downloaded or installed directly via `pip` following
the patterns below. `gsutils` can be used to navigate the directory structure
to ensure the files are there, e.g.
`gsutil ls gs://rl-infra-builds/dm_reverb/builds/dbg`. To build your own debug
binary, see the
[build instructions](https://github.com/deepmind/reverb/tree/master/reverb/pip_package#create-a-stable-reverb-release).
For python 3.8 and 3.9 follow this pattern:
```shell
$ export reverb_version=0.8.0
# Python 3.9
$ export python_version=39
$ pip install https://storage.googleapis.com/rl-infra-builds/dm_reverb/builds/dbg/$reverb_version/dm_reverb-$reverb_version-cp$python_version-cp$python_version-manylinux2010_x86_64.whl
```
### Build from source
[This guide](reverb/pip_package/README.md#how-to-develop-and-build-reverb-with-the-docker-containers)
details how to build Reverb from source.
### Reverb Releases
Due to some underlying libraries such as `protoc` and `absl`, Reverb has to be
paired with a specific version of TensorFlow. If installing Reverb as
`pip install dm-reverb[tensorflow]` the correct version of Tensorflow will be
installed. The table below lists the version of TensorFlow that each release of
Reverb is associated with and some versions of interest:
* 0.13.0 dropped Python 3.8 support.
* 0.11.0 first version to support Python 3.11.
* 0.10.0 last version to support Python 3.7.
Release | Branch / Tag | TensorFlow Version
------- | ---------------------------------------------------------- | ------------------
Nightly | [master](https://github.com/deepmind/reverb) | tf-nightly
0.14.0 | [v0.14.0](https://github.com/deepmind/reverb/tree/v0.14.0) | 2.14.0
0.13.0 | [v0.13.0](https://github.com/deepmind/reverb/tree/v0.13.0) | 2.14.0
0.12.0 | [v0.12.0](https://github.com/deepmind/reverb/tree/v0.12.0) | 2.13.0
0.11.0 | [v0.11.0](https://github.com/deepmind/reverb/tree/v0.11.0) | 2.12.0
0.10.0 | [v0.10.0](https://github.com/deepmind/reverb/tree/v0.10.0) | 2.11.0
0.9.0 | [v0.9.0](https://github.com/deepmind/reverb/tree/v0.9.0) | 2.10.0
0.8.0 | [v0.8.0](https://github.com/deepmind/reverb/tree/v0.8.0) | 2.9.0
0.7.x | [v0.7.0](https://github.com/deepmind/reverb/tree/v0.7.0) | 2.8.0
## Quick Start
Starting a Reverb server is as simple as:
```python
import reverb
server = reverb.Server(tables=[
reverb.Table(
name='my_table',
sampler=reverb.selectors.Uniform(),
remover=reverb.selectors.Fifo(),
max_size=100,
rate_limiter=reverb.rate_limiters.MinSize(1)),
],
)
```
Create a client to communicate with the server:
```python
client = reverb.Client(f'localhost:{server.port}')
print(client.server_info())
```
Write some data to the table:
```python
# Creates a single item and data element [0, 1].
client.insert([0, 1], priorities={'my_table': 1.0})
```
An item can also reference multiple data elements:
```python
# Appends three data elements and inserts a single item which references all
# of them as {'a': [2, 3, 4], 'b': [12, 13, 14]}.
with client.trajectory_writer(num_keep_alive_refs=3) as writer:
writer.append({'a': 2, 'b': 12})
writer.append({'a': 3, 'b': 13})
writer.append({'a': 4, 'b': 14})
# Create an item referencing all the data.
writer.create_item(
table='my_table',
priority=1.0,
trajectory={
'a': writer.history['a'][:],
'b': writer.history['b'][:],
})
# Block until the item has been inserted and confirmed by the server.
writer.flush()
```
The items we have added to Reverb can be read by sampling them:
```python
# client.sample() returns a generator.
print(list(client.sample('my_table', num_samples=2)))
```
Continue with the
[Reverb Tutorial](https://github.com/deepmind/reverb/tree/master/examples/demo.ipynb)
for an interactive tutorial.
## Detailed overview
Experience replay has become an important tool for training off-policy
reinforcement learning policies. It is used by algorithms such as
[Deep Q-Networks (DQN)][DQN], [Soft Actor-Critic (SAC)][SAC],
[Deep Deterministic Policy Gradients (DDPG)][DDPG], and
[Hindsight Experience Replay][HER], ... However building an efficient, easy to
use, and scalable replay system can be challenging. For good performance Reverb
is implemented in C++ and to enable distributed usage it provides a gRPC service
for adding, sampling, and updating the contents of the tables. Python clients
expose the full functionality of the service in an easy to use fashion.
Furthermore native TensorFlow ops are available for performant integration with
TensorFlow and `tf.data`.
Although originally designed for off-policy reinforcement learning, Reverb's
flexibility makes it just as useful for on-policy reinforcement -- or even
(un)supervised learning. Creative users have even used Reverb to store and
distribute frequently updated data (such as model weights), acting as an
in-memory lightweight alternative to a distributed file system where each table
represents a file.
### Tables
A Reverb `Server` consists of one or more tables. A table holds items, and each
item references one or more data elements. Tables also define sample and
removal [selection strategies](#item-selection-strategies), a maximum item
capacity, and a [rate limiter](#rate-limiting).
Multiple items can reference the same data element, even if these items exist in
different tables. This is because items only contain references to data elements
(as opposed to a copy of the data itself). This also means that a data element
is only removed when there exists no item that contains a reference to it.
For example, it is possible to set up one Table as a Prioritized Experience
Replay (PER) for transitions (sequences of length 2), and another Table as a
(FIFO) queue of sequences of length 3. In this case the PER data could be used
to train DQN, and the FIFO data to train a transition model for the environment.
![Using multiple tables](docs/images/multiple_tables_example.png)
Items are automatically removed from the Table when one of two conditions are
met:
1. Inserting a new item would cause the number of items in the Table to exceed
its maximum capacity. Table's removal strategy is used to determine which
item to remove.
1. An item has been sampled more than the maximum number of times permitted by
the Table's rate limiter. Such item is deleted.
Data elements not referenced anymore by any item are also deleted.
Users have full control over how data is sampled and removed from Reverb
tables. The behavior is primarily controlled by the
[item selection strategies](#item-selection-strategies) provided to the `Table`
as the `sampler` and `remover`. In combination with the
[`rate_limiter`](#rate-limiting) and `max_times_sampled`, a wide range of
behaviors can be achieved. Some commonly used configurations include:
**Uniform Experience Replay**
A set of `N=1000` most recently inserted items are maintained. By setting
`sampler=reverb.selectors.Uniform()`, the probability to select an item is the
same for all items. Due to `reverb.rate_limiters.MinSize(100)`, sampling
requests will block until 100 items have been inserted. By setting
`remover=reverb.selectors.Fifo()` when an item needs to be removed the oldest
item is removed first.
```python
reverb.Table(
name='my_uniform_experience_replay_buffer',
sampler=reverb.selectors.Uniform(),
remover=reverb.selectors.Fifo(),
max_size=1000,
rate_limiter=reverb.rate_limiters.MinSize(100),
)
```
Examples of algorithms that make use of uniform experience replay include [SAC]
and [DDPG].
**Prioritized Experience Replay**
A set of `N=1000` most recently inserted items. By setting
`sampler=reverb.selectors.Prioritized(priority_exponent=0.8)`, the probability
to select an item is proportional to the item's priority.
Note: See [Schaul, Tom, et al.][PER] for the algorithm used in this
implementation of Prioritized Experience Replay.
```python
reverb.Table(
name='my_prioritized_experience_replay_buffer',
sampler=reverb.selectors.Prioritized(0.8),
remover=reverb.selectors.Fifo(),
max_size=1000,
rate_limiter=reverb.rate_limiters.MinSize(100),
)
```
Examples of algorithms that make use of Prioritized Experience Replay are DQN
(and its variants), and
[Distributed Distributional Deterministic Policy Gradients][D4PG].
**Queue**
Collection of up to `N=1000` items where the oldest item is selected and removed
in the same operation. If the collection contains 1000 items then insert calls
are blocked until it is no longer full, if the collection is empty then sample
calls are blocked until there is at least one item.
```python
reverb.Table(
name='my_queue',
sampler=reverb.selectors.Fifo(),
remover=reverb.selectors.Fifo(),
max_size=1000,
max_times_sampled=1,
rate_limiter=reverb.rate_limiters.Queue(size=1000),
)
# Or use the helper classmethod `.queue`.
reverb.Table.queue(name='my_queue', max_size=1000)
```
Examples of algorithms that make use of Queues are
[IMPALA](https://arxiv.org/abs/1802.01561) and asynchronous implementations of
[Proximal Policy Optimization](https://arxiv.org/abs/1707.06347).
### Item selection strategies
Reverb defines several selectors that can be used for item sampling or removal:
- **Uniform:** Sample uniformly among all items.
- **Prioritized:** Samples proportional to stored priorities.
- **FIFO:** Selects the oldest data.
- **LIFO:** Selects the newest data.
- **MinHeap:** Selects data with the lowest priority.
- **MaxHeap:** Selects data with the highest priority.
Any of these strategies can be used for sampling or removing items from a
Table. This gives users the flexibility to create customized Tables that best
fit their needs.
### Rate Limiting
Rate limiters allow users to enforce conditions on when items can be inserted
and/or sampled from a Table. Here is a list of the rate limiters that are
currently available in Reverb:
- **MinSize:** Sets a minimum number of items that must be in the Table before
anything can be sampled.
- **SampleToInsertRatio:** Sets that the average ratio of inserts to samples
by blocking insert and/or sample requests. This is useful for controlling
the number of times each item is sampled before being removed.
- **Queue:** Items are sampled exactly once before being removed.
- **Stack:** Items are sampled exactly once before being removed.
### Sharding
Reverb servers are unaware of each other and when scaling up a system to a multi
server setup data is not replicated across more than one node. This makes Reverb
unsuitable as a traditional database but has the benefit of making it trivial to
scale up systems where some level of data loss is acceptable.
Distributed systems can be horizontally scaled by simply increasing the number
of Reverb servers. When used in combination with a gRPC compatible load
balancer, the address of the load balanced target can simply be provided to a
Reverb client and operations will automatically be distributed across the
different nodes. You'll find details about the specific behaviors in the
documentation of the relevant methods and classes.
If a load balancer is not available in your setup or if more control is required
then systems can still be scaled in almost the same way. Simply increase the
number of Reverb servers and create separate clients for each server.
### Checkpointing
Reverb supports checkpointing; the state and content of Reverb servers can be
stored to permanent storage. While checkpointing, the `Server` serializes all of
its data and metadata needed to reconstruct it. During this process the `Server`
blocks all incoming insert, sample, update, and delete requests.
Checkpointing is done with a call from the Reverb `Client`:
```python
# client.checkpoint() returns the path the checkpoint was written to.
checkpoint_path = client.checkpoint()
```
To restore the `reverb.Server` from a checkpoint:
```python
# The checkpointer accepts the path of the root directory in which checkpoints
# are written. If we pass the root directory of the checkpoints written above
# then the new server will load the most recent checkpoint written from the old
# server.
checkpointer = reverb.platform.checkpointers_lib.DefaultCheckpointer(
path=checkpoint_path.rsplit('/', 1)[0])
# The arguments passed to `tables=` must be the same as those used by the
# `Server` that wrote the checkpoint.
server = reverb.Server(tables=[...], checkpointer=checkpointer)
```
Refer to
[tfrecord_checkpointer.h](https://github.com/deepmind/reverb/tree/master/reverb/cc/platform/tfrecord_checkpointer.h)
for details on the implementation of checkpointing in Reverb.
## Starting Reverb using `reverb_server` (beta)
Installing `dm-reverb` using `pip` will install a `reverb_server` script, which
accepts its config as a textproto. For example:
```bash
$ reverb_server --config="
port: 8000
tables: {
table_name: \"my_table\"
sampler: {
fifo: true
}
remover: {
fifo: true
}
max_size: 200 max_times_sampled: 5
rate_limiter: {
min_size_to_sample: 1
samples_per_insert: 1
min_diff: $(python3 -c "import sys; print(-sys.float_info.max)")
max_diff: $(python3 -c "import sys; print(sys.float_info.max)")
}
}"
```
The `rate_limiter` config is equivalent to the Python expression `MinSize(1)`,
see `rate_limiters.py`.
## Citation
If you use this code, please cite the
[Reverb paper](https://arxiv.org/abs/2102.04736) as
```
@misc{cassirer2021reverb,
title={Reverb: A Framework For Experience Replay},
author={Albin Cassirer and Gabriel Barth-Maron and Eugene Brevdo and Sabela Ramos and Toby Boyd and Thibault Sottiaux and Manuel Kroiss},
year={2021},
eprint={2102.04736},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
<!-- Links to papers go here -->
[D4PG]: https://arxiv.org/abs/1804.08617
[DDPG]: https://arxiv.org/abs/1509.02971
[DQN]: https://www.nature.com/articles/nature14236
[HER]: https://arxiv.org/abs/1707.01495
[PER]: https://arxiv.org/abs/1511.05952
[SAC]: https://arxiv.org/abs/1801.01290
Raw data
{
"_id": null,
"home_page": "https://github.com/deepmind/reverb",
"name": "dm-reverb-nightly",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": "",
"keywords": "tensorflow deepmind reinforcement learning machine replay jax",
"author": "DeepMind",
"author_email": "DeepMind <no-reply@google.com>",
"download_url": "",
"platform": null,
"description": "# Reverb\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dm-reverb)\n[![PyPI version](https://badge.fury.io/py/dm-reverb.svg)](https://badge.fury.io/py/dm-reverb)\n\nReverb is an efficient and easy-to-use data storage and transport system\ndesigned for machine learning research. Reverb is primarily used as an\nexperience replay system for distributed reinforcement learning algorithms but\nthe system also supports multiple data structure representations such as FIFO,\nLIFO, and priority queues.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Detailed Overview](#detailed-overview)\n - [Tables](#tables)\n - [Item Selection Strategies](#item-selection-strategies)\n - [Rate Limiting](#rate-limiting)\n - [Sharding](#sharding)\n - [Checkpointing](#checkpointing)\n- [Citation](#citation)\n\n## Installation\n\nPlease keep in mind that Reverb is not hardened for production use, and while we\ndo our best to keep things in working order, things may break or segfault.\n\n> :warning: Reverb currently only supports Linux based OSes.\n\nThe recommended way to install Reverb is with `pip`. We also provide instructions\nto build from source using the same docker images we use for releases.\n\nTensorFlow can be installed separately or as part of the `pip` install.\nInstalling TensorFlow as part of the install ensures compatibility.\n\n```shell\n$ pip install dm-reverb[tensorflow]\n\n# Without Tensorflow install and version dependency check.\n$ pip install dm-reverb\n```\n\n### Nightly builds\n\n[![PyPI version](https://badge.fury.io/py/dm-reverb-nightly.svg)](https://badge.fury.io/py/dm-reverb-nightly)\n\n```shell\n$ pip install dm-reverb-nightly[tensorflow]\n\n# Without Tensorflow install and version dependency check.\n$ pip install dm-reverb-nightly\n\n```\n\n### Debug builds\n\nStarting with version 0.6.0, debug builds of Reverb are uploaded to Google Cloud\nStorage. The builds can be downloaded or installed directly via `pip` following\nthe patterns below. `gsutils` can be used to navigate the directory structure\nto ensure the files are there, e.g.\n`gsutil ls gs://rl-infra-builds/dm_reverb/builds/dbg`. To build your own debug\nbinary, see the\n[build instructions](https://github.com/deepmind/reverb/tree/master/reverb/pip_package#create-a-stable-reverb-release).\n\nFor python 3.8 and 3.9 follow this pattern:\n\n```shell\n$ export reverb_version=0.8.0\n# Python 3.9\n$ export python_version=39\n$ pip install https://storage.googleapis.com/rl-infra-builds/dm_reverb/builds/dbg/$reverb_version/dm_reverb-$reverb_version-cp$python_version-cp$python_version-manylinux2010_x86_64.whl\n```\n\n### Build from source\n\n[This guide](reverb/pip_package/README.md#how-to-develop-and-build-reverb-with-the-docker-containers)\ndetails how to build Reverb from source.\n\n\n### Reverb Releases\n\nDue to some underlying libraries such as `protoc` and `absl`, Reverb has to be\npaired with a specific version of TensorFlow. If installing Reverb as\n`pip install dm-reverb[tensorflow]` the correct version of Tensorflow will be\ninstalled. The table below lists the version of TensorFlow that each release of\nReverb is associated with and some versions of interest:\n\n * 0.13.0 dropped Python 3.8 support.\n * 0.11.0 first version to support Python 3.11.\n * 0.10.0 last version to support Python 3.7.\n\n\nRelease | Branch / Tag | TensorFlow Version\n------- | ---------------------------------------------------------- | ------------------\nNightly | [master](https://github.com/deepmind/reverb) | tf-nightly\n0.14.0 | [v0.14.0](https://github.com/deepmind/reverb/tree/v0.14.0) | 2.14.0\n0.13.0 | [v0.13.0](https://github.com/deepmind/reverb/tree/v0.13.0) | 2.14.0\n0.12.0 | [v0.12.0](https://github.com/deepmind/reverb/tree/v0.12.0) | 2.13.0\n0.11.0 | [v0.11.0](https://github.com/deepmind/reverb/tree/v0.11.0) | 2.12.0\n0.10.0 | [v0.10.0](https://github.com/deepmind/reverb/tree/v0.10.0) | 2.11.0\n0.9.0 | [v0.9.0](https://github.com/deepmind/reverb/tree/v0.9.0) | 2.10.0\n0.8.0 | [v0.8.0](https://github.com/deepmind/reverb/tree/v0.8.0) | 2.9.0\n0.7.x | [v0.7.0](https://github.com/deepmind/reverb/tree/v0.7.0) | 2.8.0\n\n## Quick Start\n\nStarting a Reverb server is as simple as:\n\n```python\nimport reverb\n\nserver = reverb.Server(tables=[\n reverb.Table(\n name='my_table',\n sampler=reverb.selectors.Uniform(),\n remover=reverb.selectors.Fifo(),\n max_size=100,\n rate_limiter=reverb.rate_limiters.MinSize(1)),\n ],\n)\n```\n\nCreate a client to communicate with the server:\n\n```python\nclient = reverb.Client(f'localhost:{server.port}')\nprint(client.server_info())\n```\n\nWrite some data to the table:\n\n```python\n# Creates a single item and data element [0, 1].\nclient.insert([0, 1], priorities={'my_table': 1.0})\n```\n\nAn item can also reference multiple data elements:\n\n```python\n# Appends three data elements and inserts a single item which references all\n# of them as {'a': [2, 3, 4], 'b': [12, 13, 14]}.\nwith client.trajectory_writer(num_keep_alive_refs=3) as writer:\n writer.append({'a': 2, 'b': 12})\n writer.append({'a': 3, 'b': 13})\n writer.append({'a': 4, 'b': 14})\n\n # Create an item referencing all the data.\n writer.create_item(\n table='my_table',\n priority=1.0,\n trajectory={\n 'a': writer.history['a'][:],\n 'b': writer.history['b'][:],\n })\n\n # Block until the item has been inserted and confirmed by the server.\n writer.flush()\n```\n\nThe items we have added to Reverb can be read by sampling them:\n\n```python\n# client.sample() returns a generator.\nprint(list(client.sample('my_table', num_samples=2)))\n```\n\nContinue with the\n[Reverb Tutorial](https://github.com/deepmind/reverb/tree/master/examples/demo.ipynb)\nfor an interactive tutorial.\n\n## Detailed overview\n\nExperience replay has become an important tool for training off-policy\nreinforcement learning policies. It is used by algorithms such as\n[Deep Q-Networks (DQN)][DQN], [Soft Actor-Critic (SAC)][SAC],\n[Deep Deterministic Policy Gradients (DDPG)][DDPG], and\n[Hindsight Experience Replay][HER], ... However building an efficient, easy to\nuse, and scalable replay system can be challenging. For good performance Reverb\nis implemented in C++ and to enable distributed usage it provides a gRPC service\nfor adding, sampling, and updating the contents of the tables. Python clients\nexpose the full functionality of the service in an easy to use fashion.\nFurthermore native TensorFlow ops are available for performant integration with\nTensorFlow and `tf.data`.\n\nAlthough originally designed for off-policy reinforcement learning, Reverb's\nflexibility makes it just as useful for on-policy reinforcement -- or even\n(un)supervised learning. Creative users have even used Reverb to store and\ndistribute frequently updated data (such as model weights), acting as an\nin-memory lightweight alternative to a distributed file system where each table\nrepresents a file.\n\n### Tables\n\nA Reverb `Server` consists of one or more tables. A table holds items, and each\nitem references one or more data elements. Tables also define sample and\nremoval [selection strategies](#item-selection-strategies), a maximum item\ncapacity, and a [rate limiter](#rate-limiting).\n\nMultiple items can reference the same data element, even if these items exist in\ndifferent tables. This is because items only contain references to data elements\n(as opposed to a copy of the data itself). This also means that a data element\nis only removed when there exists no item that contains a reference to it.\n\nFor example, it is possible to set up one Table as a Prioritized Experience\nReplay (PER) for transitions (sequences of length 2), and another Table as a\n(FIFO) queue of sequences of length 3. In this case the PER data could be used\nto train DQN, and the FIFO data to train a transition model for the environment.\n\n![Using multiple tables](docs/images/multiple_tables_example.png)\n\nItems are automatically removed from the Table when one of two conditions are\nmet:\n\n1. Inserting a new item would cause the number of items in the Table to exceed\n its maximum capacity. Table's removal strategy is used to determine which\n item to remove.\n\n1. An item has been sampled more than the maximum number of times permitted by\n the Table's rate limiter. Such item is deleted.\n\nData elements not referenced anymore by any item are also deleted.\n\nUsers have full control over how data is sampled and removed from Reverb\ntables. The behavior is primarily controlled by the\n[item selection strategies](#item-selection-strategies) provided to the `Table`\nas the `sampler` and `remover`. In combination with the\n[`rate_limiter`](#rate-limiting) and `max_times_sampled`, a wide range of\nbehaviors can be achieved. Some commonly used configurations include:\n\n**Uniform Experience Replay**\n\nA set of `N=1000` most recently inserted items are maintained. By setting\n`sampler=reverb.selectors.Uniform()`, the probability to select an item is the\nsame for all items. Due to `reverb.rate_limiters.MinSize(100)`, sampling\nrequests will block until 100 items have been inserted. By setting\n`remover=reverb.selectors.Fifo()` when an item needs to be removed the oldest\nitem is removed first.\n\n```python\nreverb.Table(\n name='my_uniform_experience_replay_buffer',\n sampler=reverb.selectors.Uniform(),\n remover=reverb.selectors.Fifo(),\n max_size=1000,\n rate_limiter=reverb.rate_limiters.MinSize(100),\n)\n```\n\nExamples of algorithms that make use of uniform experience replay include [SAC]\nand [DDPG].\n\n**Prioritized Experience Replay**\n\nA set of `N=1000` most recently inserted items. By setting\n`sampler=reverb.selectors.Prioritized(priority_exponent=0.8)`, the probability\nto select an item is proportional to the item's priority.\n\nNote: See [Schaul, Tom, et al.][PER] for the algorithm used in this\nimplementation of Prioritized Experience Replay.\n\n```python\nreverb.Table(\n name='my_prioritized_experience_replay_buffer',\n sampler=reverb.selectors.Prioritized(0.8),\n remover=reverb.selectors.Fifo(),\n max_size=1000,\n rate_limiter=reverb.rate_limiters.MinSize(100),\n)\n```\n\nExamples of algorithms that make use of Prioritized Experience Replay are DQN\n(and its variants), and\n[Distributed Distributional Deterministic Policy Gradients][D4PG].\n\n**Queue**\n\nCollection of up to `N=1000` items where the oldest item is selected and removed\nin the same operation. If the collection contains 1000 items then insert calls\nare blocked until it is no longer full, if the collection is empty then sample\ncalls are blocked until there is at least one item.\n\n```python\nreverb.Table(\n name='my_queue',\n sampler=reverb.selectors.Fifo(),\n remover=reverb.selectors.Fifo(),\n max_size=1000,\n max_times_sampled=1,\n rate_limiter=reverb.rate_limiters.Queue(size=1000),\n)\n\n# Or use the helper classmethod `.queue`.\nreverb.Table.queue(name='my_queue', max_size=1000)\n```\n\nExamples of algorithms that make use of Queues are\n[IMPALA](https://arxiv.org/abs/1802.01561) and asynchronous implementations of\n[Proximal Policy Optimization](https://arxiv.org/abs/1707.06347).\n\n### Item selection strategies\n\nReverb defines several selectors that can be used for item sampling or removal:\n\n- **Uniform:** Sample uniformly among all items.\n- **Prioritized:** Samples proportional to stored priorities.\n- **FIFO:** Selects the oldest data.\n- **LIFO:** Selects the newest data.\n- **MinHeap:** Selects data with the lowest priority.\n- **MaxHeap:** Selects data with the highest priority.\n\nAny of these strategies can be used for sampling or removing items from a\nTable. This gives users the flexibility to create customized Tables that best\nfit their needs.\n\n### Rate Limiting\n\nRate limiters allow users to enforce conditions on when items can be inserted\nand/or sampled from a Table. Here is a list of the rate limiters that are\ncurrently available in Reverb:\n\n- **MinSize:** Sets a minimum number of items that must be in the Table before\n anything can be sampled.\n- **SampleToInsertRatio:** Sets that the average ratio of inserts to samples\n by blocking insert and/or sample requests. This is useful for controlling\n the number of times each item is sampled before being removed.\n- **Queue:** Items are sampled exactly once before being removed.\n- **Stack:** Items are sampled exactly once before being removed.\n\n### Sharding\n\nReverb servers are unaware of each other and when scaling up a system to a multi\nserver setup data is not replicated across more than one node. This makes Reverb\nunsuitable as a traditional database but has the benefit of making it trivial to\nscale up systems where some level of data loss is acceptable.\n\nDistributed systems can be horizontally scaled by simply increasing the number\nof Reverb servers. When used in combination with a gRPC compatible load\nbalancer, the address of the load balanced target can simply be provided to a\nReverb client and operations will automatically be distributed across the\ndifferent nodes. You'll find details about the specific behaviors in the\ndocumentation of the relevant methods and classes.\n\nIf a load balancer is not available in your setup or if more control is required\nthen systems can still be scaled in almost the same way. Simply increase the\nnumber of Reverb servers and create separate clients for each server.\n\n### Checkpointing\n\nReverb supports checkpointing; the state and content of Reverb servers can be\nstored to permanent storage. While checkpointing, the `Server` serializes all of\nits data and metadata needed to reconstruct it. During this process the `Server`\nblocks all incoming insert, sample, update, and delete requests.\n\nCheckpointing is done with a call from the Reverb `Client`:\n\n```python\n# client.checkpoint() returns the path the checkpoint was written to.\ncheckpoint_path = client.checkpoint()\n```\n\nTo restore the `reverb.Server` from a checkpoint:\n\n```python\n# The checkpointer accepts the path of the root directory in which checkpoints\n# are written. If we pass the root directory of the checkpoints written above\n# then the new server will load the most recent checkpoint written from the old\n# server.\ncheckpointer = reverb.platform.checkpointers_lib.DefaultCheckpointer(\n path=checkpoint_path.rsplit('/', 1)[0])\n\n# The arguments passed to `tables=` must be the same as those used by the\n# `Server` that wrote the checkpoint.\nserver = reverb.Server(tables=[...], checkpointer=checkpointer)\n```\n\nRefer to\n[tfrecord_checkpointer.h](https://github.com/deepmind/reverb/tree/master/reverb/cc/platform/tfrecord_checkpointer.h)\nfor details on the implementation of checkpointing in Reverb.\n\n## Starting Reverb using `reverb_server` (beta)\n\nInstalling `dm-reverb` using `pip` will install a `reverb_server` script, which\naccepts its config as a textproto. For example:\n\n```bash\n$ reverb_server --config=\"\nport: 8000\ntables: {\n table_name: \\\"my_table\\\"\n sampler: {\n fifo: true\n }\n remover: {\n fifo: true\n }\n max_size: 200 max_times_sampled: 5\n rate_limiter: {\n min_size_to_sample: 1\n samples_per_insert: 1\n min_diff: $(python3 -c \"import sys; print(-sys.float_info.max)\")\n max_diff: $(python3 -c \"import sys; print(sys.float_info.max)\")\n }\n}\"\n```\n\nThe `rate_limiter` config is equivalent to the Python expression `MinSize(1)`,\nsee `rate_limiters.py`.\n\n\n## Citation\n\nIf you use this code, please cite the\n[Reverb paper](https://arxiv.org/abs/2102.04736) as\n\n```\n@misc{cassirer2021reverb,\n title={Reverb: A Framework For Experience Replay},\n author={Albin Cassirer and Gabriel Barth-Maron and Eugene Brevdo and Sabela Ramos and Toby Boyd and Thibault Sottiaux and Manuel Kroiss},\n year={2021},\n eprint={2102.04736},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n\n<!-- Links to papers go here -->\n\n[D4PG]: https://arxiv.org/abs/1804.08617\n[DDPG]: https://arxiv.org/abs/1509.02971\n[DQN]: https://www.nature.com/articles/nature14236\n[HER]: https://arxiv.org/abs/1707.01495\n[PER]: https://arxiv.org/abs/1511.05952\n[SAC]: https://arxiv.org/abs/1801.01290\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research.",
"version": "0.15.0.dev20240213",
"project_urls": {
"Homepage": "https://github.com/deepmind/reverb"
},
"split_keywords": [
"tensorflow",
"deepmind",
"reinforcement",
"learning",
"machine",
"replay",
"jax"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dfd9da7eac7a65ff3e2967407def6bfe325eaba2d51ec1fdc7b58437055cc3bb",
"md5": "60f7dd246c26c1cbb0dc2008eca63317",
"sha256": "e23e1cb987dd605caab23461aa9bb307ffaec67a9df32cb9581fe0fe08a218e0"
},
"downloads": -1,
"filename": "dm_reverb_nightly-0.15.0.dev20240213-cp310-cp310-manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "60f7dd246c26c1cbb0dc2008eca63317",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3",
"size": 6350385,
"upload_time": "2024-02-13T11:07:47",
"upload_time_iso_8601": "2024-02-13T11:07:47.335054Z",
"url": "https://files.pythonhosted.org/packages/df/d9/da7eac7a65ff3e2967407def6bfe325eaba2d51ec1fdc7b58437055cc3bb/dm_reverb_nightly-0.15.0.dev20240213-cp310-cp310-manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f5f48cfd20db0d1eed7bcf6ee0da5fb476968def97ba5b728aae66e23b448521",
"md5": "164358ac6b4b51a5ea03b8871355a329",
"sha256": "b7983f3012226e46f141108e7f1f818bd19a104c218523d824772500c1098da9"
},
"downloads": -1,
"filename": "dm_reverb_nightly-0.15.0.dev20240213-cp311-cp311-manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "164358ac6b4b51a5ea03b8871355a329",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3",
"size": 6350382,
"upload_time": "2024-02-13T11:07:53",
"upload_time_iso_8601": "2024-02-13T11:07:53.797316Z",
"url": "https://files.pythonhosted.org/packages/f5/f4/8cfd20db0d1eed7bcf6ee0da5fb476968def97ba5b728aae66e23b448521/dm_reverb_nightly-0.15.0.dev20240213-cp311-cp311-manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9de35b3869a68fde39f5d079a23a21e8e9d0fdb95ef43459dba7701971c291cc",
"md5": "6bcbcc3dca81c039dcb7427a68c2c5a5",
"sha256": "1b0a553184d97586c4873b7d985aafbdd3148f5f657b2f090f4c7c610cb1ec75"
},
"downloads": -1,
"filename": "dm_reverb_nightly-0.15.0.dev20240213-cp39-cp39-manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "6bcbcc3dca81c039dcb7427a68c2c5a5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3",
"size": 6352214,
"upload_time": "2024-02-13T11:07:57",
"upload_time_iso_8601": "2024-02-13T11:07:57.630926Z",
"url": "https://files.pythonhosted.org/packages/9d/e3/5b3869a68fde39f5d079a23a21e8e9d0fdb95ef43459dba7701971c291cc/dm_reverb_nightly-0.15.0.dev20240213-cp39-cp39-manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-13 11:07:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "deepmind",
"github_project": "reverb",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dm-reverb-nightly"
}