dogpile-backend-redis-advanced

Name	dogpile-backend-redis-advanced JSON
Version	0.4.1 JSON
	download
home_page	https://github.com/jvanasco/dogpile_backend_redis_advanced
Summary	Advanced Redis plugins for `dogpile.cache`.
upload_time	2023-06-09 21:25:46
maintainer
docs_url	None
author	Jonathan Vanasco
requires_python
license	BSD
keywords	caching dogpile
VCS
bugtrack_url
requirements	dogpile.cache mock msgpack-python pytest redis
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![Python package](https://github.com/jvanasco/dogpile_backend_redis_advanced/workflows/Python%20package/badge.svg)

This package supports Python2 and Python3

This package DOES NOT support `dogpile.cache>=1.0`. Support is planned, but there have been several major API changes that are incompatible.

dogpile_backend_redis_advanced
==============================

This is a plugin for the **dogpile.cache** system that offers some alternatives
to the standard **Redis** datastore implementation.

Two new backends are offered:

| backend | description |
| --- | --- |
| `dogpile_backend_redis_advanced` | extends the `dogpile.cache.redis` backend and allows for custom pickling overrides |
| `dogpile_backend_redis_advanced_hstore` | extends `dogpile_backend_redis_advanced` and allows for some specific hstore operations |

There is a negligible performance hit in `dogpile_backend_redis_advanced_hstore`,
as cache keys must be inspected to determine if they are an hstore or not -- and
there are some operations involved to coordinate values.

Additionally, some behavior is changed:

* The constructor now accepts a ``lock_class`` argument, which can be used to
  wrap a mutex and alter how releases are handled.  This can be necessary if you
  have a distributed lock and timeout or flush issues (via LRU or otherwise).
  A lock disappearing in Redis will raise a fatal exception under the standard
  Redis backend.
* The constructor now accepts a ``lock_prefix`` argument, which can be used to
  alter the prefix used for locks.  The standard Redis backend uses `_lock` as
  the prefix -- which can be hard to read or isolate for tests.  One might want
  to use "\_" as the lock prefix (so that `keys "\_*"` will show all locks).

Purpose:
--------

Mike Bayer's **dogpile.cache** is an excellent package for general purpose
development.

The system offers 3 key features:

1. Elegant read-through caching functionality.
2. A locking mechanism that ensures only the first request of a cache-miss will
   create the resource (turning the rest into consumers of the first-requestor's
   creation).
3. Integrated cache expiry against time and library versions.


Unfortunately, the integrated cache expiry feature comes at a cost -- objects
are wrapped into a tuple with some metadata and pickled before hitting the
datastore.

The additional metadata or pickle format may not be needed or wanted.  Look how
the size of "a" grows by the time it becomes something passed off to Redis:


| type  | example |
| ----- | ------- |
| string                        | a                                                                                                               |
| pickle(string)                | S'a'\np0\n.                                                                                                     |
| CachedValue(string)           | ('a', {'ct': 1471113698.76127, 'v': 1})                                                                         |
| pickle(CachedValue(string) )  | cdogpile.cache.api\nCachedValue\np0\n(S'a'\np1\n(dp2\nS'ct'\np3\nF1471113698.76127\nsS'v'\np4\nI1\nstp5\nRp6\n. |

By adding in hooks for custom serializers, this backend lets developers choose
better ways to cache data.  

You may want a serializer that doesn't care about the expiry of cached data, so
just uses simpler strings.:

| type  | example 1 | example 2 |
| ----- | --------- | --------- |
| string                                | a                         | mellifluous                         |
| json.dumps(string)                    | "a"                       | "mellifluous"                       |
| msgpack.packb(string)                 | \xa1a                     | \xabmellifluous                     |

Or, you may want to fool **dogpile.cache** by manipulating what the cached is. 
Instead of using a Python dict, of time and API version, you might just track
the time but only to the second. 

| type | example 1 | example 2 |
| ---- | --------- | --------- |
| AltCachedValue(string)                | ('a', 1471113698)         | ('mellifluous', 1471113698)         |
| json.dumps(AltCachedValue(string))    | '["a", 1471113698]'       | '["mellifluous", 1471113698]'       |
| msgpack.packb(AltCachedValue(string)) | '\x92\xa1a\xceW\xafi\xe2' | '\x92\xabmellifluous\xceW\xafi\xe2' |


This is how **dogpile.cache** stores "a":

	cdogpile.cache.api\nCachedValue\np0\n(S'a'\np1\n(dp2\nS'ct'\np3\nF1471113698.76127\nsS'v'\np4\nI1\nstp5\nRp6\n.

This package lets us cache a raw string and trick **dogpile.cache** into
thinking our data parcel is "timely":

	a

Or, we include a simpler version of the time, along with a different serializer.

This packet of data and time:

	["a", 1471113698]

Is then serialized to:

	\x92\xa1a\xceW\xafi\xe2
	
If you cache lots of big objects, **dogpile.cache**'s overhead is minimal -- but
if you have a cache that works for mapping short bits of text, like ids to
usernames (and vice-versa) you will see considerable savings.

Another way to make **Redis** more efficient is to use hash storage.

Let's say you have a lot of keys that look like this:

	region.set("user-15|posts", x)
	region.set("user-15|friends", y)
	region.set("user-15|profile", z)
	region.set("user-15|username", z1)

You could make **Redis** a bit more efficient by using hash storage, in which
you have 1 key with multiple fields:

	region.hset("user-15", {'posts': x,
							'friends', y,
							'profile', z,
							'username', z1,
							})

Redis tends to operate much more efficiently in this situation (more below),
but you can also save some bytes by not repeating the key prefix. Instagram's
engineering team has a great article on this
[Instagram Engineering](http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value).

90% of **dogpile.cache** users who choose **Redis** will never need this
package.  A decent number of other users with large datasets have been trying to
squeeze every last bit of memory and performance out of their machines -- and
this package is designed to facilitate that.


Usage:
------

myfile.py

    # importing will register the plugins
    import dogpile_backend_redis_advanced

then simply configure **dogpile.cache** with `dogpile_backend_redis_advanced` or 
`dogpile_backend_redis_advanced_hstore` as the backend.


RedisAdvancedBackend
--------------------

Two new configuration options are offered to specify custom serializers via 
`loads` and `dumps`.  The default selection is to use **dogpile.cache**'s choice
of  `pickle`.

This option was designed to support `msgpack` as the serializer:

    import msgpack
    from dogpile.cache.api import CachedValue

    def msgpack_loads(value):
        """pickle maintained the `CachedValue` wrapper of the tuple
           msgpack does not, so it must be added back in.
           """
        value = msgpack.unpackb(value, use_list=False)
        return CachedValue(*value)

    region = make_region().configure(
        arguments= {'loads': msgpack_loads,
                    'dumps': msgpack.packb,
                    }
        )


One can also abuse/misuse **dogpile.cache** and defer all cache expiry to
**Redis** using this serializer hook.

**dogpile.cache** doesn't cache your value as-is, but wraps it in a CachedValue
object which contains an API version and a timestamp for the expiry.

This format is necessary for most cache backends, but **Redis** offers the
ability to handle expiry in the cloud.  By using the slim msgpack format and
only storing the payload, you can drastically cut down the bytes needed to store
this information.

This approach SHOULD NOT BE USED by 99% of users.  However, if you do aggressive
caching, this will allow you to leverage **dogpile.cache**'s excellent locking
mechanism for handling read-through caching while slimming down your cache size
and the traffic on-the-wire.  

    import time
    from dogpile.cache.api import CachedValue
    from dogpile.cache.region import value_version
    import msgpack

    def raw_dumps(value):
        ''''pull the payload out of the CachedValue and serialize that
        '''
        value = value.payload
        value = msgpack.packb(value)
        return value

    def raw_loads(value):
        ''''unpack the value and return a CachedValue with the current time
        '''
        value = msgpack.unpackb(value, use_list=False)
        return CachedValue(
            value,
            {
                "ct": time.time(),
                "v": value_version
            })

    region = make_region().configure(
        arguments= {'loads': msgpack_loads,
                    'dumps': msgpack.packb,
                    'redis_expiration_time': 1,
                    }
        )


RedisAdvancedHstoreBackend
--------------------------

This backend extends **RedisAdvancedBackend** with drop-in support for Hash
storage under Redis.

* If key names are tuples, they will be treated as hash operations on Redis.
* By setting `redis_expiration_time_hash` to a boolean value, you can control
  how expiry times work within Redis

This backend has a slight, negligible, overhead:

* All key operations (`get`/`get_multi`/`set`/`set_multi`/`delete`) require an
  inspection of keys.
* `get_multi` requires the order of keys to be tracked, and results from
  multiple `get`/`hget` operations are then correlated.
* `set_multi` requires the mapping to be analyzed and bucketed into different
  hmsets

`redis_expiration_time_hash` allows some extended management of expiry in Redis.
By default it is set to `None`.

* `False` - ignore hash expiry. (never set a TTL in Redis)
* `None` - set `redis_expiration_time` on new hash creation only. This requires
  a check to the **Redis** key before a set.
* `True` - unconditionally set `redis_expiration_time` on every hash key
  set/update.

Please note the following:

* **Redis** manages the expiry of hashes on the key, making it global for all
  fields in the hash.
* **Redis** does not support setting a TTL on hashes while doing another
  operation.  TTL must be set via another request.
* If `redis_expiration_time_hash` is set to `True`, there will be 2 calls to
  the **Redis** API for every key: `hset` or `hmset` then `expires`.
* If `redis_expiration_time_hash` is set to `None`, there will be 2-3 calls to
  the **Redis** API for every key: `exists`, `hset` or `hmset`, and possibly
  `expires`.


Memory Savings and Suggested Usage
--------------------------------------

Redis is an in-memory datastore that offers persistence -- optimizing storage is
incredibly important because the entire set must be held in-memory.

### Example Demo

The attached `demo.py` (results in `demo.txt`) shows some potential approaches
to caching and hashing by priming a **Redis** datastore with some possible
strategies of a single dataset.

It's worth looking at `demo.txt` to see how the different serializesr encode the
data -- sample keys are pulled for each format.

| test                     | memory bytes | memory human | relative | ttl on Redis? | ttl in dogpile? | backend                                 | encoder |
| ------------------------ | ------------ | ------------ | -------- | ------------- | --------------- | --------------------------------------- | ------- |
| region_redis             | 249399504    | 237.85M      | 0%       | Y             | Y               | `dogpile.cache.redis`                   | pickle  |
| region_json              | 222924496    | 212.60M      | 89.38%   | Y             | Y               | `dogpile_backend_redis_advanced`        | json    |
| region_msgpack           | 188472048    | 179.74M      | 75.57%   | Y             | Y               | `dogpile_backend_redis_advanced`        | msgpack |
| region_redis_local       | 181501200    | 173.09M      | 72.78%   | -             | Y               | `dogpile.cache.redis`                   | pickle  |
| region_json_raw          | 171554880    | 163.61M      | 68.79%   | Y             | -               | `dogpile_backend_redis_advanced`        | json    |
| region_msgpack_raw       | 170765872    | 162.86M      | 68.47%   | Y             | -               | `dogpile_backend_redis_advanced`        | msgpack |
| region_json_local        | 162612752    | 155.08M      | 65.20%   | -             | Y               | `dogpile_backend_redis_advanced`        | json    |
| region_json_local_int    | 128648576    | 122.69M      | 57.71%   | -             | Y, `int(time)`  | `dogpile_backend_redis_advanced`        | json    |
| region_msgpack_local     | 128160048    | 122.22M      | 51.39%   | -             | Y               | `dogpile_backend_redis_advanced`        | msgpack |
| region_msgpack_local_int | 126938576    | 121.06M      | 50.89%   | -             | Y, `int(time)`  | `dogpile_backend_redis_advanced`        | msgpack |
| region_json_raw_local    | 111241280    | 106.09M      | 44.60%   | -             | -               | `dogpile_backend_redis_advanced`        | json    |
| region_msgpack_raw_local | 110455968    | 105.34M      | 44.29%   | -             | -               | `dogpile_backend_redis_advanced`        | msgpack |
| region_msgpack_raw_hash  | 28518864     | 27.20M       | 11.44%   | Y, only keys  | -               | `dogpile_backend_redis_advanced_hstore` | msgpack |
| region_json_raw_hash     | 24836160     | 23.69M       |  9.96%   | Y, only keys  | -               | `dogpile_backend_redis_advanced_hstore` | json    |

Notes:

* the `_local` variants do not set a TTL on Redis
* the `_raw` variants strip out the dogpile CachedValue wrapper and only store
  the payload
* the `_msgpack` variants use msgpack instead of pickle 
* the `_json` variants use json instead of pickle 
* the `_int` variant applies int() to the dogpile timestamp, dropping a few
  bytes per entry

Wait WHAT? LOOK AT `region_msgpack_raw_hash` and `region_json_raw_hash` - that's
a HUGE savings!

Yes.

The HSTORE has considerable savings due to 2 reasons:

* **Redis** internally manages a hash much more effectively than keys.
* **Redis** will only put an expiry on the keys (buckets), not the hash fields

HSTORE ends up being a much tighter memory usage for this example set, as we're
setting 100 fields in each key.  The savings would not be so severe if you were
setting 5-10 fields per key

Note that `region_msgpack_raw_local` and `region_json_raw_local` should not be
used unless you're running a LRU -- they have no expiry.

### Assumptions

This demo is assuming a few things that are not tested here (but there are
plenty of benchmarks on the internet showing this):

* msgpack is the fastest encoder for serializing and deserializing data.
* json outperforms cpickle on serializing; cpickle outperforms json on
  deserializing data.

Here are some benchmarks and links:

* https://gist.github.com/justinfx/3174062
* https://gist.github.com/cactus/4073643
* http://www.benfrederickson.com/dont-pickle-your-data/

#### Caveats

In the examples above, we deal with (de)serializing simple, native, datatypes:
`string`, `int`, `bool`, `list`, `dict`, `tuple`.  For these datatypes, msgpack
is both the smallest datastore and the fastest performer.

If you need to store more complex types, you will need to provide a custom
encoder/decoder and will likely suffer a performance hit on the speed of
(de)serialization.  Unfortunately, the more complex data types that require
custom encoding/decoding include standard `datetime` objects, which can be
annoying.

The file `custom_serializer.py` shows an example class for handling
(de)serialization -- `MsgpackSerializer`.  Some common `datetime` formats are
supported; they are encoded as a specially formatted dict, and decoded
correspondingly.  A few tricks are used to shave off time and make it roughly
comparable to the speed of pickle.


### Key Takeaways

* this was surprising - while the differences are negligible on small datasets,
  using **Redis** to track expiry on long data-sets is generally not a good
  idea(!). **dogpile.cache** tracks this data much more efficiently.  you can
  enable an LRU policy in **Redis** to aid in expiry.
* msgpack and json are usually fairly comparable in size [remember the
  assumption that msgpack is better for speed].
* reformatting the **dogpile.cache** metadata (replacing a `dict` an `int()` of
  the expiry) saves a lot of space under JSON when you have small payloads. the
  strings are a fraction of the size.
* msgpack is really good with nested data structures 

The following payloads for `1` are strings:

    region_json_local =        '[10, {"v": 1, "ct": 1471113698.76127}]'
    region_json_local_int =    '[10, 1471113753]'
    region_msgpack_local =     '\x92\n\x82\xa1v\x01\xa2ct\xcbA\xd5\xeb\x92\x83\xe9\x97\x9a'
    region_msgpack_local_int = '\x92\n\xceW\xafct'


### So what should you use?

There are several tradeoffs and concepts to consider:

1. Do you want to access information outside of **dogpile.cache** (in Python
   scripts, or even in another language)
2. Are you worried about the time to serialize/deserialize?  are you write-heavy
   or read-heavy?
3. Do you want the TTL to be handled by **Redis** or within Python?
4. What are your expiry needs?  what do your keys look like?  there may not be
   any savings possible.  but if you have a lot of recycled prefixes, there
   could be.
5. What do your values look like?  How many are there?

This test uses a particular dataset, and differences are inherent to the types
of data and keys. Using the strategies from the `region_msgpack_raw_hash` on
our production data has consistently dropped a 300MB **Redis** imprint to the
60-80MB range.

The **Redis** configuration file is also enclosed. The above tests are done with
**Redis** compression turned on (which is why memory size fluctuates in the full
demo reporting).   


Custom Lock Classes
-------------------

If your Redis db gets flushed the lock will disappear. This will cause the Redis
backend to raise an exception EVEN THOUGH you have succeeded in generating your
data.

By using a ``lock_class``, you can catch the exception and decide what to do --
log it?, continue on, raise an error?  It's up to you!

	import redis.exceptions

	class RedisDistributedLockProxy(object):
		"""example lock wrapper
		this will silently pass if a LockError is encountered
		"""
		mutex = None

		def __init__(self, mutex):
			self.mutex = mutex

		def acquire(self, *_args, **_kwargs):
			return self.mutex.acquire(*_args, **_kwargs)

		def release(self):
			# defer imports until backend is used
			global redis
			import redis  # noqa
			try:
				self.mutex.release()
			except redis.exceptions.LockError, e:
				# log.debug("safe lock timeout")
				pass
			except Exception as e:
				raise



To Do
--------------------------------------

I've been experimenting with handling the TTL within a hash bucket (instead of
using the **Redis** or **dogpile.cache** methods). This looks promising.  The
rationale is that it is easier for **Redis** to get/set an extra field from the
same hash, than it is to do separate calls to TTL/EXPIRES.  

in code:

	- hset('example', 'foo', 'bar')
	- expires('example', 3600)
	+ hmset('example', {'foo': 'bar',
						'expires': time.time() + 3600,
						}

I've also been experimenting with blessing the result into a subclass of `dict`
that would do the object pair decoding lazily as-needed.
That would speed up most use cases.


Maturity
--------------------------------------

This package is pre-release.  I've been using these strategies in production
via a custom fork of **dogpile.cache** for several years, but am currently
migrating it to a plugin.


Maintenance and Upstream Compatibility
--------------------------------------

Some files in /tests are entirely from **dogpile.cache** as-is:

*   /tests/conftest.py
*   /tests/cache/\__init__.py
*   /tests/cache/\_fixtures.py
        
They are versions from **dogpile.cache** 0.6.2

The core file, `/cache/backends/redis_advanced.py` inherits from
**dogpile.cache**'s `/cache/backends/redis.py`


Testing
-------

This ships with full tests.  

Much of the core package and test fixtures are from **dogpile.cache** and
copyright from that project, which is available under the MIT license.

Tests are handled through tox

Examples:

```
tox
tox -e py27 -- tests/cache/test_redis_backend.py
tox -e py27 -- tests/cache/test_redis_backend.py::RedisAdvanced_SerializedRaw_Test
tox -e py27 -- tests/cache/test_redis_backend.py::HstoreTest
``` 

Tests pass on the enclosed `redis.conf` file:

```/usr/local/Cellar/redis/3.0.7/redis-server ./redis-server--6379.conf```



License
-------

This project is available under the same MIT license as **dogpile.cache**.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jvanasco/dogpile_backend_redis_advanced",
    "name": "dogpile-backend-redis-advanced",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "caching dogpile",
    "author": "Jonathan Vanasco",
    "author_email": "jonathan@findmeon.com",
    "download_url": "https://files.pythonhosted.org/packages/43/e4/0d2cfcbfa595c86f5564c398e68e134eeac342f9104d7d2d50a84eb7d175/dogpile_backend_redis_advanced-0.4.1.tar.gz",
    "platform": null,
    "description": "![Python package](https://github.com/jvanasco/dogpile_backend_redis_advanced/workflows/Python%20package/badge.svg)\n\nThis package supports Python2 and Python3\n\nThis package DOES NOT support `dogpile.cache>=1.0`. Support is planned, but there have been several major API changes that are incompatible.\n\ndogpile_backend_redis_advanced\n==============================\n\nThis is a plugin for the **dogpile.cache** system that offers some alternatives\nto the standard **Redis** datastore implementation.\n\nTwo new backends are offered:\n\n| backend | description |\n| --- | --- |\n| `dogpile_backend_redis_advanced` | extends the `dogpile.cache.redis` backend and allows for custom pickling overrides |\n| `dogpile_backend_redis_advanced_hstore` | extends `dogpile_backend_redis_advanced` and allows for some specific hstore operations |\n\nThere is a negligible performance hit in `dogpile_backend_redis_advanced_hstore`,\nas cache keys must be inspected to determine if they are an hstore or not -- and\nthere are some operations involved to coordinate values.\n\nAdditionally, some behavior is changed:\n\n* The constructor now accepts a ``lock_class`` argument, which can be used to\n  wrap a mutex and alter how releases are handled.  This can be necessary if you\n  have a distributed lock and timeout or flush issues (via LRU or otherwise).\n  A lock disappearing in Redis will raise a fatal exception under the standard\n  Redis backend.\n* The constructor now accepts a ``lock_prefix`` argument, which can be used to\n  alter the prefix used for locks.  The standard Redis backend uses `_lock` as\n  the prefix -- which can be hard to read or isolate for tests.  One might want\n  to use \"\\_\" as the lock prefix (so that `keys \"\\_*\"` will show all locks).\n\nPurpose:\n--------\n\nMike Bayer's **dogpile.cache** is an excellent package for general purpose\ndevelopment.\n\nThe system offers 3 key features:\n\n1. Elegant read-through caching functionality.\n2. A locking mechanism that ensures only the first request of a cache-miss will\n   create the resource (turning the rest into consumers of the first-requestor's\n   creation).\n3. Integrated cache expiry against time and library versions.\n\n\nUnfortunately, the integrated cache expiry feature comes at a cost -- objects\nare wrapped into a tuple with some metadata and pickled before hitting the\ndatastore.\n\nThe additional metadata or pickle format may not be needed or wanted.  Look how\nthe size of \"a\" grows by the time it becomes something passed off to Redis:\n\n\n| type  | example |\n| ----- | ------- |\n| string                        | a                                                                                                               |\n| pickle(string)                | S'a'\\np0\\n.                                                                                                     |\n| CachedValue(string)           | ('a', {'ct': 1471113698.76127, 'v': 1})                                                                         |\n| pickle(CachedValue(string) )  | cdogpile.cache.api\\nCachedValue\\np0\\n(S'a'\\np1\\n(dp2\\nS'ct'\\np3\\nF1471113698.76127\\nsS'v'\\np4\\nI1\\nstp5\\nRp6\\n. |\n\nBy adding in hooks for custom serializers, this backend lets developers choose\nbetter ways to cache data.  \n\nYou may want a serializer that doesn't care about the expiry of cached data, so\njust uses simpler strings.:\n\n| type  | example 1 | example 2 |\n| ----- | --------- | --------- |\n| string                                | a                         | mellifluous                         |\n| json.dumps(string)                    | \"a\"                       | \"mellifluous\"                       |\n| msgpack.packb(string)                 | \\xa1a                     | \\xabmellifluous                     |\n\nOr, you may want to fool **dogpile.cache** by manipulating what the cached is. \nInstead of using a Python dict, of time and API version, you might just track\nthe time but only to the second. \n\n| type | example 1 | example 2 |\n| ---- | --------- | --------- |\n| AltCachedValue(string)                | ('a', 1471113698)         | ('mellifluous', 1471113698)         |\n| json.dumps(AltCachedValue(string))    | '[\"a\", 1471113698]'       | '[\"mellifluous\", 1471113698]'       |\n| msgpack.packb(AltCachedValue(string)) | '\\x92\\xa1a\\xceW\\xafi\\xe2' | '\\x92\\xabmellifluous\\xceW\\xafi\\xe2' |\n\n\nThis is how **dogpile.cache** stores \"a\":\n\n\tcdogpile.cache.api\\nCachedValue\\np0\\n(S'a'\\np1\\n(dp2\\nS'ct'\\np3\\nF1471113698.76127\\nsS'v'\\np4\\nI1\\nstp5\\nRp6\\n.\n\nThis package lets us cache a raw string and trick **dogpile.cache** into\nthinking our data parcel is \"timely\":\n\n\ta\n\nOr, we include a simpler version of the time, along with a different serializer.\n\nThis packet of data and time:\n\n\t[\"a\", 1471113698]\n\nIs then serialized to:\n\n\t\\x92\\xa1a\\xceW\\xafi\\xe2\n\t\nIf you cache lots of big objects, **dogpile.cache**'s overhead is minimal -- but\nif you have a cache that works for mapping short bits of text, like ids to\nusernames (and vice-versa) you will see considerable savings.\n\nAnother way to make **Redis** more efficient is to use hash storage.\n\nLet's say you have a lot of keys that look like this:\n\n\tregion.set(\"user-15|posts\", x)\n\tregion.set(\"user-15|friends\", y)\n\tregion.set(\"user-15|profile\", z)\n\tregion.set(\"user-15|username\", z1)\n\nYou could make **Redis** a bit more efficient by using hash storage, in which\nyou have 1 key with multiple fields:\n\n\tregion.hset(\"user-15\", {'posts': x,\n\t\t\t\t\t\t\t'friends', y,\n\t\t\t\t\t\t\t'profile', z,\n\t\t\t\t\t\t\t'username', z1,\n\t\t\t\t\t\t\t})\n\nRedis tends to operate much more efficiently in this situation (more below),\nbut you can also save some bytes by not repeating the key prefix. Instagram's\nengineering team has a great article on this\n[Instagram Engineering](http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value).\n\n90% of **dogpile.cache** users who choose **Redis** will never need this\npackage.  A decent number of other users with large datasets have been trying to\nsqueeze every last bit of memory and performance out of their machines -- and\nthis package is designed to facilitate that.\n\n\nUsage:\n------\n\nmyfile.py\n\n    # importing will register the plugins\n    import dogpile_backend_redis_advanced\n\nthen simply configure **dogpile.cache** with `dogpile_backend_redis_advanced` or \n`dogpile_backend_redis_advanced_hstore` as the backend.\n\n\nRedisAdvancedBackend\n--------------------\n\nTwo new configuration options are offered to specify custom serializers via \n`loads` and `dumps`.  The default selection is to use **dogpile.cache**'s choice\nof  `pickle`.\n\nThis option was designed to support `msgpack` as the serializer:\n\n    import msgpack\n    from dogpile.cache.api import CachedValue\n\n    def msgpack_loads(value):\n        \"\"\"pickle maintained the `CachedValue` wrapper of the tuple\n           msgpack does not, so it must be added back in.\n           \"\"\"\n        value = msgpack.unpackb(value, use_list=False)\n        return CachedValue(*value)\n\n    region = make_region().configure(\n        arguments= {'loads': msgpack_loads,\n                    'dumps': msgpack.packb,\n                    }\n        )\n\n\nOne can also abuse/misuse **dogpile.cache** and defer all cache expiry to\n**Redis** using this serializer hook.\n\n**dogpile.cache** doesn't cache your value as-is, but wraps it in a CachedValue\nobject which contains an API version and a timestamp for the expiry.\n\nThis format is necessary for most cache backends, but **Redis** offers the\nability to handle expiry in the cloud.  By using the slim msgpack format and\nonly storing the payload, you can drastically cut down the bytes needed to store\nthis information.\n\nThis approach SHOULD NOT BE USED by 99% of users.  However, if you do aggressive\ncaching, this will allow you to leverage **dogpile.cache**'s excellent locking\nmechanism for handling read-through caching while slimming down your cache size\nand the traffic on-the-wire.  \n\n    import time\n    from dogpile.cache.api import CachedValue\n    from dogpile.cache.region import value_version\n    import msgpack\n\n    def raw_dumps(value):\n        ''''pull the payload out of the CachedValue and serialize that\n        '''\n        value = value.payload\n        value = msgpack.packb(value)\n        return value\n\n    def raw_loads(value):\n        ''''unpack the value and return a CachedValue with the current time\n        '''\n        value = msgpack.unpackb(value, use_list=False)\n        return CachedValue(\n            value,\n            {\n                \"ct\": time.time(),\n                \"v\": value_version\n            })\n\n    region = make_region().configure(\n        arguments= {'loads': msgpack_loads,\n                    'dumps': msgpack.packb,\n                    'redis_expiration_time': 1,\n                    }\n        )\n\n\nRedisAdvancedHstoreBackend\n--------------------------\n\nThis backend extends **RedisAdvancedBackend** with drop-in support for Hash\nstorage under Redis.\n\n* If key names are tuples, they will be treated as hash operations on Redis.\n* By setting `redis_expiration_time_hash` to a boolean value, you can control\n  how expiry times work within Redis\n\nThis backend has a slight, negligible, overhead:\n\n* All key operations (`get`/`get_multi`/`set`/`set_multi`/`delete`) require an\n  inspection of keys.\n* `get_multi` requires the order of keys to be tracked, and results from\n  multiple `get`/`hget` operations are then correlated.\n* `set_multi` requires the mapping to be analyzed and bucketed into different\n  hmsets\n\n`redis_expiration_time_hash` allows some extended management of expiry in Redis.\nBy default it is set to `None`.\n\n* `False` - ignore hash expiry. (never set a TTL in Redis)\n* `None` - set `redis_expiration_time` on new hash creation only. This requires\n  a check to the **Redis** key before a set.\n* `True` - unconditionally set `redis_expiration_time` on every hash key\n  set/update.\n\nPlease note the following:\n\n* **Redis** manages the expiry of hashes on the key, making it global for all\n  fields in the hash.\n* **Redis** does not support setting a TTL on hashes while doing another\n  operation.  TTL must be set via another request.\n* If `redis_expiration_time_hash` is set to `True`, there will be 2 calls to\n  the **Redis** API for every key: `hset` or `hmset` then `expires`.\n* If `redis_expiration_time_hash` is set to `None`, there will be 2-3 calls to\n  the **Redis** API for every key: `exists`, `hset` or `hmset`, and possibly\n  `expires`.\n\n\nMemory Savings and Suggested Usage\n--------------------------------------\n\nRedis is an in-memory datastore that offers persistence -- optimizing storage is\nincredibly important because the entire set must be held in-memory.\n\n### Example Demo\n\nThe attached `demo.py` (results in `demo.txt`) shows some potential approaches\nto caching and hashing by priming a **Redis** datastore with some possible\nstrategies of a single dataset.\n\nIt's worth looking at `demo.txt` to see how the different serializesr encode the\ndata -- sample keys are pulled for each format.\n\n| test                     | memory bytes | memory human | relative | ttl on Redis? | ttl in dogpile? | backend                                 | encoder |\n| ------------------------ | ------------ | ------------ | -------- | ------------- | --------------- | --------------------------------------- | ------- |\n| region_redis             | 249399504    | 237.85M      | 0%       | Y             | Y               | `dogpile.cache.redis`                   | pickle  |\n| region_json              | 222924496    | 212.60M      | 89.38%   | Y             | Y               | `dogpile_backend_redis_advanced`        | json    |\n| region_msgpack           | 188472048    | 179.74M      | 75.57%   | Y             | Y               | `dogpile_backend_redis_advanced`        | msgpack |\n| region_redis_local       | 181501200    | 173.09M      | 72.78%   | -             | Y               | `dogpile.cache.redis`                   | pickle  |\n| region_json_raw          | 171554880    | 163.61M      | 68.79%   | Y             | -               | `dogpile_backend_redis_advanced`        | json    |\n| region_msgpack_raw       | 170765872    | 162.86M      | 68.47%   | Y             | -               | `dogpile_backend_redis_advanced`        | msgpack |\n| region_json_local        | 162612752    | 155.08M      | 65.20%   | -             | Y               | `dogpile_backend_redis_advanced`        | json    |\n| region_json_local_int    | 128648576    | 122.69M      | 57.71%   | -             | Y, `int(time)`  | `dogpile_backend_redis_advanced`        | json    |\n| region_msgpack_local     | 128160048    | 122.22M      | 51.39%   | -             | Y               | `dogpile_backend_redis_advanced`        | msgpack |\n| region_msgpack_local_int | 126938576    | 121.06M      | 50.89%   | -             | Y, `int(time)`  | `dogpile_backend_redis_advanced`        | msgpack |\n| region_json_raw_local    | 111241280    | 106.09M      | 44.60%   | -             | -               | `dogpile_backend_redis_advanced`        | json    |\n| region_msgpack_raw_local | 110455968    | 105.34M      | 44.29%   | -             | -               | `dogpile_backend_redis_advanced`        | msgpack |\n| region_msgpack_raw_hash  | 28518864     | 27.20M       | 11.44%   | Y, only keys  | -               | `dogpile_backend_redis_advanced_hstore` | msgpack |\n| region_json_raw_hash     | 24836160     | 23.69M       |  9.96%   | Y, only keys  | -               | `dogpile_backend_redis_advanced_hstore` | json    |\n\nNotes:\n\n* the `_local` variants do not set a TTL on Redis\n* the `_raw` variants strip out the dogpile CachedValue wrapper and only store\n  the payload\n* the `_msgpack` variants use msgpack instead of pickle \n* the `_json` variants use json instead of pickle \n* the `_int` variant applies int() to the dogpile timestamp, dropping a few\n  bytes per entry\n\nWait WHAT? LOOK AT `region_msgpack_raw_hash` and `region_json_raw_hash` - that's\na HUGE savings!\n\nYes.\n\nThe HSTORE has considerable savings due to 2 reasons:\n\n* **Redis** internally manages a hash much more effectively than keys.\n* **Redis** will only put an expiry on the keys (buckets), not the hash fields\n\nHSTORE ends up being a much tighter memory usage for this example set, as we're\nsetting 100 fields in each key.  The savings would not be so severe if you were\nsetting 5-10 fields per key\n\nNote that `region_msgpack_raw_local` and `region_json_raw_local` should not be\nused unless you're running a LRU -- they have no expiry.\n\n### Assumptions\n\nThis demo is assuming a few things that are not tested here (but there are\nplenty of benchmarks on the internet showing this):\n\n* msgpack is the fastest encoder for serializing and deserializing data.\n* json outperforms cpickle on serializing; cpickle outperforms json on\n  deserializing data.\n\nHere are some benchmarks and links:\n\n* https://gist.github.com/justinfx/3174062\n* https://gist.github.com/cactus/4073643\n* http://www.benfrederickson.com/dont-pickle-your-data/\n\n#### Caveats\n\nIn the examples above, we deal with (de)serializing simple, native, datatypes:\n`string`, `int`, `bool`, `list`, `dict`, `tuple`.  For these datatypes, msgpack\nis both the smallest datastore and the fastest performer.\n\nIf you need to store more complex types, you will\u00a0need to provide a custom\nencoder/decoder and will likely suffer a performance hit on the speed of\n(de)serialization.  Unfortunately, the more complex data types that require\ncustom encoding/decoding include standard `datetime` objects, which can be\nannoying.\n\nThe file `custom_serializer.py` shows an example class for handling\n(de)serialization -- `MsgpackSerializer`.  Some common `datetime` formats are\nsupported; they are encoded as a specially formatted dict, and decoded\ncorrespondingly.  A few tricks are used to shave off time and make it roughly\ncomparable to the speed of pickle.\n\n\n### Key Takeaways\n\n* this was surprising - while the differences are negligible on small datasets,\n  using **Redis** to track expiry on long data-sets is generally not a good\n  idea(!). **dogpile.cache** tracks this data much more efficiently.  you can\n  enable an LRU policy in **Redis** to aid in expiry.\n* msgpack and json are usually fairly comparable in size [remember the\n  assumption that msgpack is better for speed].\n* reformatting the **dogpile.cache** metadata (replacing a `dict` an `int()` of\n  the expiry) saves a lot of space under JSON when you have small payloads. the\n  strings are a fraction of the size.\n* msgpack is really good with nested data structures \n\nThe following payloads for `1` are strings:\n\n    region_json_local =        '[10, {\"v\": 1, \"ct\": 1471113698.76127}]'\n    region_json_local_int =    '[10, 1471113753]'\n    region_msgpack_local =     '\\x92\\n\\x82\\xa1v\\x01\\xa2ct\\xcbA\\xd5\\xeb\\x92\\x83\\xe9\\x97\\x9a'\n    region_msgpack_local_int = '\\x92\\n\\xceW\\xafct'\n\n\n### So what should you use?\n\nThere are several tradeoffs and concepts to consider:\n\n1. Do you want to access information outside of **dogpile.cache** (in Python\n   scripts, or even in another language)\n2. Are you worried about the time to serialize/deserialize?  are you write-heavy\n   or read-heavy?\n3. Do you want the TTL to be handled by **Redis** or within Python?\n4. What are your expiry needs?  what do your keys look like?  there may not be\n   any savings possible.  but if you have a lot of recycled prefixes, there\n   could be.\n5. What do your values look like?  How many are there?\n\nThis test uses a particular dataset, and differences are inherent to the types\nof data and keys. Using the strategies from the `region_msgpack_raw_hash` on\nour production data has consistently dropped a 300MB **Redis** imprint to the\n60-80MB range.\n\nThe **Redis** configuration file is also enclosed. The above tests are done with\n**Redis** compression turned on (which is why memory size fluctuates in the full\ndemo reporting).   \n\n\nCustom Lock Classes\n-------------------\n\nIf your Redis db gets flushed the lock will disappear. This will cause the Redis\nbackend to raise an exception EVEN THOUGH you have succeeded in generating your\ndata.\n\nBy using a ``lock_class``, you can catch the exception and decide what to do --\nlog it?, continue on, raise an error?  It's up to you!\n\n\timport redis.exceptions\n\n\tclass RedisDistributedLockProxy(object):\n\t\t\"\"\"example lock wrapper\n\t\tthis will silently pass if a LockError is encountered\n\t\t\"\"\"\n\t\tmutex = None\n\n\t\tdef __init__(self, mutex):\n\t\t\tself.mutex = mutex\n\n\t\tdef acquire(self, *_args, **_kwargs):\n\t\t\treturn self.mutex.acquire(*_args, **_kwargs)\n\n\t\tdef release(self):\n\t\t\t# defer imports until backend is used\n\t\t\tglobal redis\n\t\t\timport redis  # noqa\n\t\t\ttry:\n\t\t\t\tself.mutex.release()\n\t\t\texcept redis.exceptions.LockError, e:\n\t\t\t\t# log.debug(\"safe lock timeout\")\n\t\t\t\tpass\n\t\t\texcept Exception as e:\n\t\t\t\traise\n\n\n\nTo Do\n--------------------------------------\n\nI've been experimenting with handling the TTL within a hash bucket (instead of\nusing the **Redis** or **dogpile.cache** methods). This looks promising.  The\nrationale is that it is easier for **Redis** to get/set an extra field from the\nsame hash, than it is to do separate calls to TTL/EXPIRES.  \n\nin code:\n\n\t- hset('example', 'foo', 'bar')\n\t- expires('example', 3600)\n\t+ hmset('example', {'foo': 'bar',\n\t\t\t\t\t\t'expires': time.time() + 3600,\n\t\t\t\t\t\t}\n\nI've also been experimenting with blessing the result into a subclass of `dict`\nthat would do the object pair decoding lazily as-needed.\nThat would speed up most use cases.\n\n\nMaturity\n--------------------------------------\n\nThis package is pre-release.  I've been using these strategies in production\nvia a custom fork of **dogpile.cache** for several years, but am currently\nmigrating it to a plugin.\n\n\nMaintenance and Upstream Compatibility\n--------------------------------------\n\nSome files in /tests are entirely from **dogpile.cache** as-is:\n\n*   /tests/conftest.py\n*   /tests/cache/\\__init__.py\n*   /tests/cache/\\_fixtures.py\n        \nThey are versions from **dogpile.cache** 0.6.2\n\nThe core file, `/cache/backends/redis_advanced.py` inherits from\n**dogpile.cache**'s `/cache/backends/redis.py`\n\n\nTesting\n-------\n\nThis ships with full tests.  \n\nMuch of the core package and test fixtures are from **dogpile.cache** and\ncopyright from that project, which is available under the MIT license.\n\nTests are handled through tox\n\nExamples:\n\n```\ntox\ntox -e py27 -- tests/cache/test_redis_backend.py\ntox -e py27 -- tests/cache/test_redis_backend.py::RedisAdvanced_SerializedRaw_Test\ntox -e py27 -- tests/cache/test_redis_backend.py::HstoreTest\n``` \n\nTests pass on the enclosed `redis.conf` file:\n\n```/usr/local/Cellar/redis/3.0.7/redis-server ./redis-server--6379.conf```\n\n\n\nLicense\n-------\n\nThis project is available under the same MIT license as **dogpile.cache**.",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Advanced Redis plugins for `dogpile.cache`.",
    "version": "0.4.1",
    "project_urls": {
        "Homepage": "https://github.com/jvanasco/dogpile_backend_redis_advanced"
    },
    "split_keywords": [
        "caching",
        "dogpile"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "43e40d2cfcbfa595c86f5564c398e68e134eeac342f9104d7d2d50a84eb7d175",
                "md5": "2d74ec371bc0d0ad240311e4eff4ce82",
                "sha256": "f1aaff292ad9d6a67d6e776c11858f462d775a05344d1ecf841557ba7a33906b"
            },
            "downloads": -1,
            "filename": "dogpile_backend_redis_advanced-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2d74ec371bc0d0ad240311e4eff4ce82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26364,
            "upload_time": "2023-06-09T21:25:46",
            "upload_time_iso_8601": "2023-06-09T21:25:46.079871Z",
            "url": "https://files.pythonhosted.org/packages/43/e4/0d2cfcbfa595c86f5564c398e68e134eeac342f9104d7d2d50a84eb7d175/dogpile_backend_redis_advanced-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-09 21:25:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jvanasco",
    "github_project": "dogpile_backend_redis_advanced",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "dogpile.cache",
            "specs": []
        },
        {
            "name": "mock",
            "specs": []
        },
        {
            "name": "msgpack-python",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "redis",
            "specs": []
        }
    ],
    "tox": true,
    "lcname": "dogpile-backend-redis-advanced"
}

Jonathan Vanasco