sharedbuffers


Namesharedbuffers JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/jampp/sharedbuffers/
SummaryShared-memory structured buffers
upload_time2023-10-10 20:36:15
maintainerClaudio Freire
docs_urlNone
authorJampp
requires_python
licenseBSD 3-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. _using-sharedbuffers:

Using sharedbuffers
===================

This library implements shared-memory typed buffers that can be read and manipulated (and we'll eventually
support writes too) efficiently without serialization or deserialization.

The main supported implementation of obtaining shared memory is by memory-mapping files, but the library also supports
mapping buffers (anonymous mmap objects) as well, albeit they're harder to share among processes.

Supported primivite types:

    * int (up to 64 bit precision)
    * str (bytes)
    * unicode
    * frozenset
    * tuple / list
    * dict
    * buffer
    * date
    * datetime
    * numpy arrays
    * decimal

Primitive types can be cloned into their actual builtin objects (As specified by the mapped types), which is fast,
but potentially memory-intensive. In addition, they can be proxied, in which case they will be built directly
on top of the memory mapping, without the need for constructing the actual object. Proxied objects aim at supporting
the same interface as the builtin containers.

Objects can be registered with schema serializers and thus composite types can be mapped as well. For this to function
properly, objects need a class attribute specifying the attributes it holds and the type of the attributes. When an
attribute doesn't have a clearly defined type, it can be wrapped in a RTTI-containing container by specifying it as
type `object`.

For example:

.. code:: python

    class SomeStruct(object):
        __slot_types__ = {
            'a' : int,
            'b' : float,
            's' : str,
            'u' : unicode,
            'fset' : frozenset,
            'l' : list,
            'o' : object,
        }
        __slots__ = __slot_types__.keys()

Adding `__slot_types__`, however, isn't enough to make the object mappable. A schema definition needs to be created,
which can be used to map files or buffers and obtain proxies to the information within:

.. code:: python

    class SomeStruct(object):
        __slot_types__ = {
            'a' : int,
            'b' : float,
            's' : str,
            'u' : unicode,
            'fset' : frozenset,
            'l' : list,
            'o' : object,
        }
        __slots__ = __slot_types__.keys()
        __schema__ = mapped_struct.Schema.from_typed_slots(__slot_types__)

Using the schema is thus straightforward:

.. code:: python

    s = SomeStruct()
    s.a = 3
    s.s = 'blah'
    s.fset = frozenset([1,3])
    s.o = 3
    s.__schema__.pack(s) # returns a bytearray

    buf = bytearray(1000)

    # writes in offset 10 of buf, returns the size of the written object
    s.__schema__.pack_into(s, buf, 10)

    # returns a proxy for the object just packed into buf, does not deserialize
    p = s.__schema__.unpack_from(s, buf, 10)

    print p.a
    print p.s
    print p.fset

.. _composite-types:

Declaring compound types
------------------------

Typed objects can be nested, but for that a typecode must be assigned to each type in order for `RTTI` to properly
identify the custom types:

.. code:: python

    SomeStruct.__mapped_type__ = mapped_struct.mapped_object.register_schema(
        SomeStruct, SomeStruct.__schema__, 'S')

From then on, `SomeStruct` can be used as any other type when declaring field types.

.. _container-structures:

Container structures
--------------------

High-level typed container_ classes can be created by inheriting the proper base class.

The API for these high-level container objects is aimed at collections that don't really fit in RAM in their
pure-python form, so they must be built using an iterator over the items (ideally a generator that doesn't
put the whole collection in memory at once), and then mapped from the resulting file or buffer.

Currently, there are three kind of mappings supported: string-to-object, uint-to-object and a generic object-to-object.
The first two are provided for efficiency's sake; use the generic one when the others won't do.

.. code:: python

    class StructArray(mapped_struct.MappedArrayProxyBase):
        schema = SomeStruct.__schema__
    class StructNameMapping(mapped_struct.MappedMappingProxyBase):
        IdMapper = mapped_struct.StringIdMapper
        ValueArray = StructArray
    class StructIdMapping(mapped_struct.MappedMappingProxyBase):
        IdMapper = mapped_struct.NumericIdMapper
        ValueArray = StructArray
    class StructObjectMapping(mapped_struct.MappedMappingProxyBase):
        IdMapper = mapped_struct.ObjectIdMapper
        ValueArray = StructArray

An example:

.. code:: python

    with tempfile.NamedTemporaryFile() as destfile:
        arr = StructArray.build([SomeStruct(), SomeStruct()], destfile=destfile)
        print arr[0]

    with tempfile.NamedTemporaryFile() as destfile:
        arr = StructNameMapping.build(dict(a=SomeStruct(), b=SomeStruct()).iteritems(), destfile=destfile)
        print arr['a']

    with tempfile.NamedTemporaryFile() as destfile:
        arr = StructIdMapping.build({1:SomeStruct(), 3:SomeStruct()}.iteritems(), destfile=destfile)
        print arr[3]

.. _idmap-usage:

When using nested hierarchies, it's possible to unify references to the same object by specifying an `idmap` dict.
However, since the idmap will map objects by their `id()`, objects must be kept alive by holding references to
them while they're still referenced in the idmap, so its usage is non-trivial. An example technique:

.. code:: python

    def all_structs(idmap):
        iter_all = iter(some_generator)
        while True:
            idmap.clear()

            sstructs = list(itertools.islice(iter_all, 10000))
            if not sstructs:
                break

            for ss in sstructs :
                # mapping from "s" attribute to struct
                yield (ss.s, ss)
            del sstructs

    idmap = {}
    name_mapping = StructNameMapping.build(all_structs(idmap),
        destfile = destfile, idmap = idmap)

The above code syncs the lifetime of objects and their idmap entries to avoid mapping issues. If the invariant
isn't maintained (objects referenced in the idmap are alive and holding a unique `id()` value), the result will be
silent corruption of the resulting mapping due to object identity mixups.

There are variants of the mapping proxy classes and their associated id mapper classes that implement multi-maps.
That is, mappings that, when fed with multiple values for a key, will return a list of values for that key rather
than a single key. Their in-memory representation is identical, but their querying API returns all matching values
rather than the first one, so multi-maps and simple mappings are binary compatible.

Multi-maps with string keys can also be approximate, meaning the original keys will be discarded and the mapping will
only work with hashes, making the map much faster and more compact, at the expense of some inaccuracy where the
returned values could have extra values corresponding to other keys whose hash collide with the one being requested.

Running tests
-------------

Running tests can be done locally or on docker, using the script `run-tests.sh`:

.. code:: shell

  $> virtualenv venv
  $> . venv/bin/activate
  $> sh ./run-tests.sh


Alternatively, running it on docker can be done with the following command:

.. code:: shell

  $> docker run -v ${PWD}:/opt/sharedbuffers -w /opt/sharedbuffers python:2.7 /bin/sh run-tests.sh

.. _container: https://en.wikipedia.org/wiki/Container_(abstract_data_type)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jampp/sharedbuffers/",
    "name": "sharedbuffers",
    "maintainer": "Claudio Freire",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "klauss@jampp.com",
    "keywords": "",
    "author": "Jampp",
    "author_email": "klauss@jampp.com",
    "download_url": "https://files.pythonhosted.org/packages/ba/81/940c5e078ed89bb4fafbc4e5311bb509b8a4397826fbe4519205db75cb1a/sharedbuffers-1.2.1.tar.gz",
    "platform": null,
    "description": ".. _using-sharedbuffers:\n\nUsing sharedbuffers\n===================\n\nThis library implements shared-memory typed buffers that can be read and manipulated (and we'll eventually\nsupport writes too) efficiently without serialization or deserialization.\n\nThe main supported implementation of obtaining shared memory is by memory-mapping files, but the library also supports\nmapping buffers (anonymous mmap objects) as well, albeit they're harder to share among processes.\n\nSupported primivite types:\n\n    * int (up to 64 bit precision)\n    * str (bytes)\n    * unicode\n    * frozenset\n    * tuple / list\n    * dict\n    * buffer\n    * date\n    * datetime\n    * numpy arrays\n    * decimal\n\nPrimitive types can be cloned into their actual builtin objects (As specified by the mapped types), which is fast,\nbut potentially memory-intensive. In addition, they can be proxied, in which case they will be built directly\non top of the memory mapping, without the need for constructing the actual object. Proxied objects aim at supporting\nthe same interface as the builtin containers.\n\nObjects can be registered with schema serializers and thus composite types can be mapped as well. For this to function\nproperly, objects need a class attribute specifying the attributes it holds and the type of the attributes. When an\nattribute doesn't have a clearly defined type, it can be wrapped in a RTTI-containing container by specifying it as\ntype `object`.\n\nFor example:\n\n.. code:: python\n\n    class SomeStruct(object):\n        __slot_types__ = {\n            'a' : int,\n            'b' : float,\n            's' : str,\n            'u' : unicode,\n            'fset' : frozenset,\n            'l' : list,\n            'o' : object,\n        }\n        __slots__ = __slot_types__.keys()\n\nAdding `__slot_types__`, however, isn't enough to make the object mappable. A schema definition needs to be created,\nwhich can be used to map files or buffers and obtain proxies to the information within:\n\n.. code:: python\n\n    class SomeStruct(object):\n        __slot_types__ = {\n            'a' : int,\n            'b' : float,\n            's' : str,\n            'u' : unicode,\n            'fset' : frozenset,\n            'l' : list,\n            'o' : object,\n        }\n        __slots__ = __slot_types__.keys()\n        __schema__ = mapped_struct.Schema.from_typed_slots(__slot_types__)\n\nUsing the schema is thus straightforward:\n\n.. code:: python\n\n    s = SomeStruct()\n    s.a = 3\n    s.s = 'blah'\n    s.fset = frozenset([1,3])\n    s.o = 3\n    s.__schema__.pack(s) # returns a bytearray\n\n    buf = bytearray(1000)\n\n    # writes in offset 10 of buf, returns the size of the written object\n    s.__schema__.pack_into(s, buf, 10)\n\n    # returns a proxy for the object just packed into buf, does not deserialize\n    p = s.__schema__.unpack_from(s, buf, 10)\n\n    print p.a\n    print p.s\n    print p.fset\n\n.. _composite-types:\n\nDeclaring compound types\n------------------------\n\nTyped objects can be nested, but for that a typecode must be assigned to each type in order for `RTTI` to properly\nidentify the custom types:\n\n.. code:: python\n\n    SomeStruct.__mapped_type__ = mapped_struct.mapped_object.register_schema(\n        SomeStruct, SomeStruct.__schema__, 'S')\n\nFrom then on, `SomeStruct` can be used as any other type when declaring field types.\n\n.. _container-structures:\n\nContainer structures\n--------------------\n\nHigh-level typed container_ classes can be created by inheriting the proper base class.\n\nThe API for these high-level container objects is aimed at collections that don't really fit in RAM in their\npure-python form, so they must be built using an iterator over the items (ideally a generator that doesn't\nput the whole collection in memory at once), and then mapped from the resulting file or buffer.\n\nCurrently, there are three kind of mappings supported: string-to-object, uint-to-object and a generic object-to-object.\nThe first two are provided for efficiency's sake; use the generic one when the others won't do.\n\n.. code:: python\n\n    class StructArray(mapped_struct.MappedArrayProxyBase):\n        schema = SomeStruct.__schema__\n    class StructNameMapping(mapped_struct.MappedMappingProxyBase):\n        IdMapper = mapped_struct.StringIdMapper\n        ValueArray = StructArray\n    class StructIdMapping(mapped_struct.MappedMappingProxyBase):\n        IdMapper = mapped_struct.NumericIdMapper\n        ValueArray = StructArray\n    class StructObjectMapping(mapped_struct.MappedMappingProxyBase):\n        IdMapper = mapped_struct.ObjectIdMapper\n        ValueArray = StructArray\n\nAn example:\n\n.. code:: python\n\n    with tempfile.NamedTemporaryFile() as destfile:\n        arr = StructArray.build([SomeStruct(), SomeStruct()], destfile=destfile)\n        print arr[0]\n\n    with tempfile.NamedTemporaryFile() as destfile:\n        arr = StructNameMapping.build(dict(a=SomeStruct(), b=SomeStruct()).iteritems(), destfile=destfile)\n        print arr['a']\n\n    with tempfile.NamedTemporaryFile() as destfile:\n        arr = StructIdMapping.build({1:SomeStruct(), 3:SomeStruct()}.iteritems(), destfile=destfile)\n        print arr[3]\n\n.. _idmap-usage:\n\nWhen using nested hierarchies, it's possible to unify references to the same object by specifying an `idmap` dict.\nHowever, since the idmap will map objects by their `id()`, objects must be kept alive by holding references to\nthem while they're still referenced in the idmap, so its usage is non-trivial. An example technique:\n\n.. code:: python\n\n    def all_structs(idmap):\n        iter_all = iter(some_generator)\n        while True:\n            idmap.clear()\n\n            sstructs = list(itertools.islice(iter_all, 10000))\n            if not sstructs:\n                break\n\n            for ss in sstructs :\n                # mapping from \"s\" attribute to struct\n                yield (ss.s, ss)\n            del sstructs\n\n    idmap = {}\n    name_mapping = StructNameMapping.build(all_structs(idmap),\n        destfile = destfile, idmap = idmap)\n\nThe above code syncs the lifetime of objects and their idmap entries to avoid mapping issues. If the invariant\nisn't maintained (objects referenced in the idmap are alive and holding a unique `id()` value), the result will be\nsilent corruption of the resulting mapping due to object identity mixups.\n\nThere are variants of the mapping proxy classes and their associated id mapper classes that implement multi-maps.\nThat is, mappings that, when fed with multiple values for a key, will return a list of values for that key rather\nthan a single key. Their in-memory representation is identical, but their querying API returns all matching values\nrather than the first one, so multi-maps and simple mappings are binary compatible.\n\nMulti-maps with string keys can also be approximate, meaning the original keys will be discarded and the mapping will\nonly work with hashes, making the map much faster and more compact, at the expense of some inaccuracy where the\nreturned values could have extra values corresponding to other keys whose hash collide with the one being requested.\n\nRunning tests\n-------------\n\nRunning tests can be done locally or on docker, using the script `run-tests.sh`:\n\n.. code:: shell\n\n  $> virtualenv venv\n  $> . venv/bin/activate\n  $> sh ./run-tests.sh\n\n\nAlternatively, running it on docker can be done with the following command:\n\n.. code:: shell\n\n  $> docker run -v ${PWD}:/opt/sharedbuffers -w /opt/sharedbuffers python:2.7 /bin/sh run-tests.sh\n\n.. _container: https://en.wikipedia.org/wiki/Container_(abstract_data_type)\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause",
    "summary": "Shared-memory structured buffers",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/jampp/sharedbuffers/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ba81940c5e078ed89bb4fafbc4e5311bb509b8a4397826fbe4519205db75cb1a",
                "md5": "cfbfa0cd17e25356eb48a41a83149513",
                "sha256": "52fc26203bb6546072685af4e425fefd2ab7971d40a69af199c459a643247b32"
            },
            "downloads": -1,
            "filename": "sharedbuffers-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "cfbfa0cd17e25356eb48a41a83149513",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1017149,
            "upload_time": "2023-10-10T20:36:15",
            "upload_time_iso_8601": "2023-10-10T20:36:15.354941Z",
            "url": "https://files.pythonhosted.org/packages/ba/81/940c5e078ed89bb4fafbc4e5311bb509b8a4397826fbe4519205db75cb1a/sharedbuffers-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-10 20:36:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jampp",
    "github_project": "sharedbuffers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "sharedbuffers"
}
        
Elapsed time: 0.12257s