pbjson
======
Packed Binary JSON extension for Python
``pbjson`` is a packed binary JSON encoder and decoder for Python 2.5+
and Python 3.3+. It is pure Python code with no dependencies, but
includes an optional C extension for a serious speed boost.
``pbjson`` can be used standalone or as an extension to the standard
``json`` module or to ``simplejson``, from which code was heavily
borrowed. The latest documentation for ``simplejson`` can be read online
here: http://simplejson.readthedocs.org/
The encoder can be specialized to provide serialization in any kind of
situation, without any special support by the objects to be serialized
(somewhat like pickle). This is best done with the ``default`` kwarg to
dumps.
The decoder can handle incoming JSON strings of any specified encoding
(UTF-8 by default). It can also be specialized to post-process JSON
objects with the ``object_hook`` or ``object_pairs_hook`` kwargs. This
is particularly useful for implementing protocols that have a richer
type system than JSON itself.
Using the API
-------------
The ``pbjson`` module works ust like the ``json`` module. You can ``pbjson.load``,
``pbjson.loads``, ``pbjson.dump``, and ``pbjson.dumps``.
Command-Line Tool
-----------------
After you have installed ``pbjson``, you can use the ``pbjson`` command-line tool to
convert files to or from ``pbjson``. Run ``pbjson -h`` for details.
What is Packed Binary JSON (``PBJSON``)
---------------------------------------
Packed Binary JSON is not the same as ``BSON``. ``BSON`` is a format
used primarily in MongoDB and is meant for efficient parsing. ``PBJSON``
is meant for efficient conversion from a dict or list, transmission and
conversion back to a dict or list on the other end. ``BSON`` has
explicit support for several types not available in standard JSON.
PBJSON supports only those types supported by normal JSON, plus binary
data blobs and set collections.
Unlike ``BSON``, ``PBJSON`` is almost always smaller than the equivalent
JSON. Like ``BSON``, ``PBJSON`` can be very quickly encoded and decoded
since all elements are length encoded.
There are two types of tokens in ``PBJSON``: data and key. Data tokens
can be zero length fundamental types (``false``, ``true``, ``null``),
variable length fundamental types (``int``, ``float``, ``string``,
``binary``) or containers (``set``, ``array``, ``dict``).
The type for the data token is generally stored in the top 3 bits (bits
5-7). Type zero is a special type to represent the zero length
fundamental types. The lower bits indicate the actual value. These are:
Zero-length Data Types:
- 00 - false
- 01 - true
- 02 - null
All other types are variable length. If the length is between 0 and 15,
that length is stored in bits 0-3. For lengths in the 16-2047 range, bit
4 is set and bits 0-2 are combined with the next byte to make an 11-bit
length. If bits 4 and 3 are both set, then the value in bits 0-2 are
combined with the next 2 bytes to create a 19-bit length. However, if
bits 4-0 are all set, this indicates that the following 4 bytes are
simply used as a size. So the token plus length is, one byte (length of
0-15), two bytes (16-2047), three bytes (2048-458751) or five bytes
(458876-4294967295).
Variable-length Data types:
- 2x - int (bytes stored big endian with leading zero bytes removed)
- 4x - negative int (bytes stored big endian as a positive number with
leading zero bytes removed)
- 6x - float (stored as big endian double precision with trailing zero
bytes removed)
- 8x - string
- Ax - binary
Collection types: (length is number of elements)
- Cx - array
- Ex - object
- 0C - terminated array
- 0F - terminator
The final entry, the "terminated array" works a bit differently. This is
for use when the length is not known when writing begins. Instead, a
terminator (0F) is written to the stream when the last element of the
array has been written.
Object keys must be text and are a maximum of 127 bytes in length. They
are stored as a (7-bit length, followed by the actual key. The first 128
keys are remembered by index. If the same key is used again, it can be
represented as a single byte consisting of the high bit and the index
number of the key.
In other words, if the recurring key is "toast", it should be encoded as
05 toast. The next time the key "toast" is needed, it can be encoded as
simply 80, since it was the first key.
Here is an example of a simple structure:
.. code:: javascript
{
"toast": true,
"burned": false,
"name": "the best",
"toppings": ["jelly","jam","butter"],
"dimensions": {
"thickness": 0.7,
"width": 4.5
}
}
::
E5 05 'toast' 01 06 'burned' 00 04 'name' 88 'the best'
08 'toppings' C3 85 'jelly' 83 'jam' 86 'butter'
0A 'dimensions' E2 09 'thickness' 61 d7 05 'width' 62 4d5d
Let's break that out:
- 00: E5 - dict with 5 elements
- 01: 05 - key with 5 characters
- 02-06: toast
- 07: 01 - true
- 08: 06 - key with 6 characters
- 09-0E: burned
- 0F: 00 - false
- 10: 04 - key with 4 characters
- 11-14: name
- 15: 88 - string with 8 characters
- 16-1D: the best
- 1E: 08 - key with 8 bytes
- 1F-26: toppings
- 27: C3 - array with 3 elements
- 28: 85 - string with 5 characters
- 29-2D: jelly
- 2E: 83 - string with 3 characters
- 2F-31: jam
- 32: 86 - string with 6 characters
- 33-38: butter
- 39: 0A - key with 10 bytes
- 3A-43: dimensions
- 44: E3 - dict with 2 elements
- 45: 09 - key with 9 characters
- 46-4E: thickness
- 4F: 61 - float with 1 bytes
- 50: first byte of IEEE representation of .7. Remaining 7 bytes were all zeros.
- 51: 05 - key with 5 characters
- 52-56: width
- 57: 62 - float with 2 bytes
- 58-59: first 2 bytes of IEEE representation of 4.5. Remaining 6 bytes were all zeros.
Total 90 bytes. The tightest ``JSON`` representation requires 126 bytes.
Marshal takes 153 bytes. Pickle takes 184 bytes. BSON takes 145 bytes.
Now here is an example with repeating data:
.. code:: javascript
{
"region": 3,
"countries": [
{"code": "us", "name": "United States"},
{"code": "ca", "name": "Canada"},
{"code": "mx", "name": "Mexico"}
]
}
::
E2 06 region 21 03 09 countries C3
E2 04 code 82 us 04 name 8D United States
E2 82 82 ca 83 86 Canada
E2 82 82 mx 83 86 Mexico
This breaks down thus:
- 00: E2 - dict with 2 elements
- 01: 06 - key with 6 characters
- 02-07: region
- 08: 21 - int with 1 byte
- 09: 03 - the int for 3. Only a single byte is required.
- 0A: 09 - key with 9 bytes
- 0B-13: countries
- 14: C3 - array with 3 elements
- 15: E2 - dict with 2 elements
- 16: 04 - key with 4 characters
- 17-1A: code
- 19: 82 - string with 2 characters
- 1A-1B: us
- 1C: 04 - key with 4 characters
- 1E-21: name
- 22: 8D - string with 13 characters
- 23-2F: United States
- 30: E2 - dict with 2 elements
- 31: 82 - recurring key 2. Since 'code' was the 3rd key, it has an
index of 2.
- 32: 82 - string with 2 characters
- 33-34: ca
- 35: 83 - recurring key 3
- 36: 86 - string with 6 characters
- 37-3C: Canada
- 3D: E2 - dict with 2 elements
- 3E: 82 - recurring key 0
- 3F: 82 - string with 2 characters
- 40-41: mx
- 42: 83 - recurring key 1
- 43: 86 - string with 6 characters
- 44-49: Mexico
Total 74 bytes. The tightest ``JSON`` representation requires 123 bytes.
Marshal takes 158 bytes and Pickle takes 162. BSON takes 154 bytes.
``Packed Binary JSON`` is available now in the ``pbjson`` Python module.
That module includes a command line utility to convert between normal
``JSON`` files and ``PBJSON``.
Raw data
{
"_id": null,
"home_page": "https://github.com/scottkmaxwell/pbjson",
"name": "pbjson",
"maintainer": "",
"docs_url": null,
"requires_python": ">=2.5, !=3.0.*, !=3.1.*, !=3.2.*",
"maintainer_email": "",
"keywords": "",
"author": "Scott Maxwell",
"author_email": "scott@codecobblers.com",
"download_url": "https://files.pythonhosted.org/packages/da/24/e8c6eb82b07e6eaaa2d22119b479a91110d4d8695550d34637ebec45a9f8/pbjson-1.19.0.tar.gz",
"platform": "any",
"description": "pbjson\n======\n\nPacked Binary JSON extension for Python\n\n``pbjson`` is a packed binary JSON encoder and decoder for Python 2.5+\nand Python 3.3+. It is pure Python code with no dependencies, but\nincludes an optional C extension for a serious speed boost.\n\n``pbjson`` can be used standalone or as an extension to the standard\n``json`` module or to ``simplejson``, from which code was heavily\nborrowed. The latest documentation for ``simplejson`` can be read online\nhere: http://simplejson.readthedocs.org/\n\nThe encoder can be specialized to provide serialization in any kind of\nsituation, without any special support by the objects to be serialized\n(somewhat like pickle). This is best done with the ``default`` kwarg to\ndumps.\n\nThe decoder can handle incoming JSON strings of any specified encoding\n(UTF-8 by default). It can also be specialized to post-process JSON\nobjects with the ``object_hook`` or ``object_pairs_hook`` kwargs. This\nis particularly useful for implementing protocols that have a richer\ntype system than JSON itself.\n\nUsing the API\n-------------\n\nThe ``pbjson`` module works ust like the ``json`` module. You can ``pbjson.load``,\n``pbjson.loads``, ``pbjson.dump``, and ``pbjson.dumps``.\n\nCommand-Line Tool\n-----------------\n\nAfter you have installed ``pbjson``, you can use the ``pbjson`` command-line tool to\nconvert files to or from ``pbjson``. Run ``pbjson -h`` for details.\n\n\nWhat is Packed Binary JSON (``PBJSON``)\n---------------------------------------\n\nPacked Binary JSON is not the same as ``BSON``. ``BSON`` is a format\nused primarily in MongoDB and is meant for efficient parsing. ``PBJSON``\nis meant for efficient conversion from a dict or list, transmission and\nconversion back to a dict or list on the other end. ``BSON`` has\nexplicit support for several types not available in standard JSON.\nPBJSON supports only those types supported by normal JSON, plus binary\ndata blobs and set collections.\n\nUnlike ``BSON``, ``PBJSON`` is almost always smaller than the equivalent\nJSON. Like ``BSON``, ``PBJSON`` can be very quickly encoded and decoded\nsince all elements are length encoded.\n\nThere are two types of tokens in ``PBJSON``: data and key. Data tokens\ncan be zero length fundamental types (``false``, ``true``, ``null``),\nvariable length fundamental types (``int``, ``float``, ``string``,\n``binary``) or containers (``set``, ``array``, ``dict``).\n\nThe type for the data token is generally stored in the top 3 bits (bits\n5-7). Type zero is a special type to represent the zero length\nfundamental types. The lower bits indicate the actual value. These are:\n\nZero-length Data Types:\n\n- 00 - false\n- 01 - true\n- 02 - null\n\nAll other types are variable length. If the length is between 0 and 15,\nthat length is stored in bits 0-3. For lengths in the 16-2047 range, bit\n4 is set and bits 0-2 are combined with the next byte to make an 11-bit\nlength. If bits 4 and 3 are both set, then the value in bits 0-2 are\ncombined with the next 2 bytes to create a 19-bit length. However, if\nbits 4-0 are all set, this indicates that the following 4 bytes are\nsimply used as a size. So the token plus length is, one byte (length of\n0-15), two bytes (16-2047), three bytes (2048-458751) or five bytes\n(458876-4294967295).\n\nVariable-length Data types:\n\n- 2x - int (bytes stored big endian with leading zero bytes removed)\n- 4x - negative int (bytes stored big endian as a positive number with\n leading zero bytes removed)\n- 6x - float (stored as big endian double precision with trailing zero\n bytes removed)\n- 8x - string\n- Ax - binary\n\nCollection types: (length is number of elements)\n\n- Cx - array\n- Ex - object\n- 0C - terminated array\n- 0F - terminator\n\nThe final entry, the \"terminated array\" works a bit differently. This is\nfor use when the length is not known when writing begins. Instead, a\nterminator (0F) is written to the stream when the last element of the\narray has been written.\n\nObject keys must be text and are a maximum of 127 bytes in length. They\nare stored as a (7-bit length, followed by the actual key. The first 128\nkeys are remembered by index. If the same key is used again, it can be\nrepresented as a single byte consisting of the high bit and the index\nnumber of the key.\n\nIn other words, if the recurring key is \"toast\", it should be encoded as\n05 toast. The next time the key \"toast\" is needed, it can be encoded as\nsimply 80, since it was the first key.\n\nHere is an example of a simple structure:\n\n.. code:: javascript\n\n {\n \"toast\": true,\n \"burned\": false,\n \"name\": \"the best\",\n \"toppings\": [\"jelly\",\"jam\",\"butter\"],\n \"dimensions\": {\n \"thickness\": 0.7,\n \"width\": 4.5\n }\n }\n\n::\n\n E5 05 'toast' 01 06 'burned' 00 04 'name' 88 'the best'\n 08 'toppings' C3 85 'jelly' 83 'jam' 86 'butter'\n 0A 'dimensions' E2 09 'thickness' 61 d7 05 'width' 62 4d5d\n\nLet's break that out:\n\n- 00: E5 - dict with 5 elements\n- 01: 05 - key with 5 characters\n- 02-06: toast\n- 07: 01 - true\n- 08: 06 - key with 6 characters\n- 09-0E: burned\n- 0F: 00 - false\n- 10: 04 - key with 4 characters\n- 11-14: name\n- 15: 88 - string with 8 characters\n- 16-1D: the best\n- 1E: 08 - key with 8 bytes\n- 1F-26: toppings\n- 27: C3 - array with 3 elements\n- 28: 85 - string with 5 characters\n- 29-2D: jelly\n- 2E: 83 - string with 3 characters\n- 2F-31: jam\n- 32: 86 - string with 6 characters\n- 33-38: butter\n- 39: 0A - key with 10 bytes\n- 3A-43: dimensions\n- 44: E3 - dict with 2 elements\n- 45: 09 - key with 9 characters\n- 46-4E: thickness\n- 4F: 61 - float with 1 bytes\n- 50: first byte of IEEE representation of .7. Remaining 7 bytes were all zeros.\n- 51: 05 - key with 5 characters\n- 52-56: width\n- 57: 62 - float with 2 bytes\n- 58-59: first 2 bytes of IEEE representation of 4.5. Remaining 6 bytes were all zeros.\n\nTotal 90 bytes. The tightest ``JSON`` representation requires 126 bytes.\nMarshal takes 153 bytes. Pickle takes 184 bytes. BSON takes 145 bytes.\n\nNow here is an example with repeating data:\n\n.. code:: javascript\n\n {\n \"region\": 3,\n \"countries\": [\n {\"code\": \"us\", \"name\": \"United States\"},\n {\"code\": \"ca\", \"name\": \"Canada\"},\n {\"code\": \"mx\", \"name\": \"Mexico\"}\n ]\n }\n\n::\n\n E2 06 region 21 03 09 countries C3\n E2 04 code 82 us 04 name 8D United States\n E2 82 82 ca 83 86 Canada\n E2 82 82 mx 83 86 Mexico\n\nThis breaks down thus:\n\n- 00: E2 - dict with 2 elements\n- 01: 06 - key with 6 characters\n- 02-07: region\n- 08: 21 - int with 1 byte\n- 09: 03 - the int for 3. Only a single byte is required.\n- 0A: 09 - key with 9 bytes\n- 0B-13: countries\n- 14: C3 - array with 3 elements\n- 15: E2 - dict with 2 elements\n- 16: 04 - key with 4 characters\n- 17-1A: code\n- 19: 82 - string with 2 characters\n- 1A-1B: us\n- 1C: 04 - key with 4 characters\n- 1E-21: name\n- 22: 8D - string with 13 characters\n- 23-2F: United States\n- 30: E2 - dict with 2 elements\n- 31: 82 - recurring key 2. Since 'code' was the 3rd key, it has an\n index of 2.\n- 32: 82 - string with 2 characters\n- 33-34: ca\n- 35: 83 - recurring key 3\n- 36: 86 - string with 6 characters\n- 37-3C: Canada\n- 3D: E2 - dict with 2 elements\n- 3E: 82 - recurring key 0\n- 3F: 82 - string with 2 characters\n- 40-41: mx\n- 42: 83 - recurring key 1\n- 43: 86 - string with 6 characters\n- 44-49: Mexico\n\nTotal 74 bytes. The tightest ``JSON`` representation requires 123 bytes.\nMarshal takes 158 bytes and Pickle takes 162. BSON takes 154 bytes.\n\n``Packed Binary JSON`` is available now in the ``pbjson`` Python module.\nThat module includes a command line utility to convert between normal\n``JSON`` files and ``PBJSON``.\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Packed Binary JSON encoder/decoder for Python",
"version": "1.19.0",
"project_urls": {
"Homepage": "https://github.com/scottkmaxwell/pbjson"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "da24e8c6eb82b07e6eaaa2d22119b479a91110d4d8695550d34637ebec45a9f8",
"md5": "bfe732ec1d335ceaaa78281c799c9676",
"sha256": "5e803ad54f0a68626979120f2ce8e1911214c9a92915554069e581c0136f49b9"
},
"downloads": -1,
"filename": "pbjson-1.19.0.tar.gz",
"has_sig": false,
"md5_digest": "bfe732ec1d335ceaaa78281c799c9676",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=2.5, !=3.0.*, !=3.1.*, !=3.2.*",
"size": 40677,
"upload_time": "2023-08-20T18:06:41",
"upload_time_iso_8601": "2023-08-20T18:06:41.329912Z",
"url": "https://files.pythonhosted.org/packages/da/24/e8c6eb82b07e6eaaa2d22119b479a91110d4d8695550d34637ebec45a9f8/pbjson-1.19.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-20 18:06:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scottkmaxwell",
"github_project": "pbjson",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pbjson"
}