dirtyjson


Namedirtyjson JSON
Version 1.0.8 PyPI version JSON
download
home_pagehttps://github.com/codecobblers/dirtyjson
SummaryJSON decoder for Python that can extract data from the muck
upload_time2022-11-28 23:32:33
maintainer
docs_urlNone
authorScott Maxwell
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            `dirtyjson` --- JSON decoder
============================

.. dirtyjson
   Decode JSON data from dirty files.
.. Scott Maxwell <scott@codecobblers.com>

JSON (JavaScript Object Notation) <http://json.org> is a subset of JavaScript
syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.

`dirtyjson` is a JSON decoder meant for extracting JSON-type data from .js
files. The returned data structure includes information about line and column
numbers, so you can output more useful error messages. The input can also
include single quotes, line comments, inline comments, dangling commas,
unquoted single-word keys, and hexadecimal and octal numbers.

The goal of `dirtyjson` is to read JSON objects out of files that are
littered with elements that do not fit the official JSON standard. By providing
line and column number contexts, a dirty JSON file can be used as source input
for a complex data parser or compiler.

`dirtyjson` exposes an API familiar to users of the standard library
`marshal` and `pickle` modules. However, `dirtyjson` provides
only the `load(s)` capability. To write JSON, use either the standard
`json` library or `simplejson`.

.. note::

   The code for `dirtyjson` is a fairly drastically rewritten version
   of the loader in `simplejson` so thanks go to Bob Ippolito of the
   `simplejson` project for providing such a nice starting point.

Development of dirtyjson happens on Github:
https://github.com/codecobblers/dirtyjson

Decoding JSON and getting position information::

    >>> import dirtyjson
    >>> obj = [u'foo', {u'bar': [u'baz', None, 1.0, 2]}]
    >>> d = dirtyjson.loads("""["foo", /* not fu*/ {bar: ['baz', null, 1.0, 2,]}] and then ignore this junk""")
    >>> d == obj
    True
    >>> pos = d.attributes(0)  # line/column position of first element in array
    >>> pos.line == 1
    True
    >>> pos.column == 2
    True
    >>> pos = d[1].attributes('bar')  # line/column position of 'bar' key/value pair
    >>> pos.key.line == 1
    True
    >>> pos.key.column == 22
    True
    >>> pos.value.line == 1
    True
    >>> pos.value.column == 27
    True

Decoding unicode from JSON::

    >>> dirtyjson.loads('"\\"foo\\bar"') == u'"foo\x08ar'
    True

Decoding JSON from streams::

    >>> from dirtyjson.compat import StringIO
    >>> io = StringIO('["streaming API"]')
    >>> dirtyjson.load(io)[0] == 'streaming API'
    True

Using Decimal instead of float::

    >>> import dirtyjson
    >>> from decimal import Decimal
    >>> dirtyjson.loads('1.1', parse_float=Decimal) == Decimal('1.1')
    True


Basic Usage
-----------

load(fp[, encoding[, parse_float[, parse_int[, parse_constant[, search_for_first_object]]]]])

   Performs the following translations in decoding by default:

   +---------------+-------------------------+
   | JSON          | Python                  |
   +===============+=========================+
   | object        | `AttributedDict`        |
   +---------------+-------------------------+
   | array         | `AttributedList`        |
   +---------------+-------------------------+
   | string        | unicode                 |
   +---------------+-------------------------+
   | number (int)  | int, long               |
   +---------------+-------------------------+
   | number (real) | float                   |
   +---------------+-------------------------+
   | true          | True                    |
   +---------------+-------------------------+
   | false         | False                   |
   +---------------+-------------------------+
   | null          | None                    |
   +---------------+-------------------------+

   It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as their
   corresponding ``float`` values, which is outside the JSON spec.

   Deserialize *fp* (a ``.read()``-supporting file-like object containing a JSON
   document) to a Python object. `dirtyjson.Error` will be
   raised if the given document is not valid.

   If the contents of *fp* are encoded with an ASCII based encoding other than
   UTF-8 (e.g. latin-1), then an appropriate *encoding* name must be specified.
   Encodings that are not ASCII based (such as UCS-2) are not allowed, and
   should be wrapped with ``codecs.getreader(fp)(encoding)``, or simply decoded
   to a `unicode` object and passed to `loads`. The default
   setting of ``'utf-8'`` is fastest and should be using whenever possible.

   If *fp.read()* returns `str` then decoded JSON strings that contain
   only ASCII characters may be parsed as `str` for performance and
   memory reasons. If your code expects only `unicode` the appropriate
   solution is to wrap fp with a reader as demonstrated above.

   *parse_float*, if specified, will be called with the string of every JSON
   float to be decoded. By default, this is equivalent to ``float(num_str)``.
   This can be used to use another datatype or parser for JSON floats
   (e.g. `decimal.Decimal`).

   *parse_int*, if specified, will be called with the int of the string of every
   JSON int to be decoded. By default, this is equivalent to ``int(num_str)``.
   This can be used to use another datatype or parser for JSON integers
   (e.g. `float`).

   .. note::

      Unlike the standard `json` module, `dirtyjson` always does
      ``int(num_str, 0)`` before passing through to the converter passed is as
      the *parse_int* parameter. This is to enable automatic handling of hex
      and octal numbers.

   *parse_constant*, if specified, will be called with one of the following
   strings: ``true``, ``false``, ``null``, ``'-Infinity'``, ``'Infinity'``,
   ``'NaN'``. This can be used to raise an exception if invalid JSON numbers are
   encountered or to provide alternate values for any of these constants.

   *search_for_first_object*, if ``True``, will cause the parser to search for
   the first occurrence of either ``{`` or ``[``. This is very useful for
   reading an object from a JavaScript file.

loads(s[, encoding[, parse_float[, parse_int[, parse_constant[, search_for_first_object[, start_index]]]]])

   Deserialize *s* (a `str` or `unicode` instance containing a JSON
   document) to a Python object. `dirtyjson.Error` will be
   raised if the given JSON document is not valid.

   If *s* is a `str` instance and is encoded with an ASCII based encoding
   other than UTF-8 (e.g. latin-1), then an appropriate *encoding* name must be
   specified. Encodings that are not ASCII based (such as UCS-2) are not
   allowed and should be decoded to `unicode` first.

   If *s* is a `str` then decoded JSON strings that contain
   only ASCII characters may be parsed as `str` for performance and
   memory reasons. If your code expects only `unicode` the appropriate
   solution is decode *s* to `unicode` prior to calling loads.

   *start_index*, if non-zero, will cause the parser to start processing from
   the specified offset, while maintaining the correct line and column numbers.
   This is very useful for reading an object from the middle of a JavaScript
   file.

   The other arguments have the same meaning as in `load`.

Exceptions
----------

dirtyjson.Error(msg, doc, pos)

    Subclass of `ValueError` with the following additional attributes:

    msg

        The unformatted error message

    doc

        The JSON document being parsed

    pos

        The start index of doc where parsing failed

    lineno

        The line corresponding to pos

    colno

        The column corresponding to pos

AttributedDict and AttributedList
---------------------------------

The `dirtyjson` module uses `AttributedDict` and
`AttributedList` instead of ``dict`` and ``list``. Each is actually a
subclass of its base type (``dict`` or ``list``) and can be used as if they were
the standard class, but these have been enhanced to store attributes with each
element. We use those attributes to store line and column numbers. You can use
that information to refer users back to the exact location in the original
source file.

Position()

   This is a very simple utility class that contains ``line`` and ``column``.
   It is used for storing the position attributes for `AttributedList`
   and `KeyValuePosition`

KeyValuePosition()

   This is another very simple utility class that contains ``key`` and
   ``value``. Each of those is a `Position` object specifying the
   location in the original source string/file of the key and value. It is used
   for storing the position attributes for `AttributedDict`.

AttributedDict()

   A subclass of ``dict`` that behaves exactly like a ``dict`` except that it
   maintains order like an ``OrderedDict`` and allows storing attributes for
   each key/value pair.

   add_with_attributes(self, key, value, attributes)

      Set the *key* in the underlying ``dict`` to the *value* and also store
      whatever is passed in as *attributes* for later retrieval. In our case,
      we store `KeyValuePosition`.

   attributes(self, key)

      Return the attributes associated with the specified *key* or ``None`` if
      no attributes exist for the key. In our case, we store
      `KeyValuePosition`. Retrieve position info like this::

         pos = d.attributes(key)
         key_line = pos.key.line
         key_column = pos.key.column
         value_line = pos.value.line
         value_column = pos.value.column

AttributedList()

   A subclass of ``list`` that behaves exactly like a ``list`` except that it
   allows storing attributes for each value.

   append(self, value, attributes=None):

      Appends *value* to the list and *attributes* to the associated location.
      In our case, we store `Position`.

   attributes(self, index)

      Returns the attributes for the value at the given *index*. In our case,
      we store `Position`. Retrieve position info like this::

         pos = l.attributes(index)
         value_line = pos.line
         value_column = pos.column

   .. note::

      This class is *NOT* robust. If you insert or delete items, the attributes
      will get out of sync. Making this a non-naive class would be a nice
      enhancement.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/codecobblers/dirtyjson",
    "name": "dirtyjson",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Scott Maxwell",
    "author_email": "scott@codecobblers.com",
    "download_url": "https://files.pythonhosted.org/packages/db/04/d24f6e645ad82ba0ef092fa17d9ef7a21953781663648a01c9371d9e8e98/dirtyjson-1.0.8.tar.gz",
    "platform": "any",
    "description": "`dirtyjson` --- JSON decoder\n============================\n\n.. dirtyjson\n   Decode JSON data from dirty files.\n.. Scott Maxwell <scott@codecobblers.com>\n\nJSON (JavaScript Object Notation) <http://json.org> is a subset of JavaScript\nsyntax (ECMA-262 3rd edition) used as a lightweight data interchange format.\n\n`dirtyjson` is a JSON decoder meant for extracting JSON-type data from .js\nfiles. The returned data structure includes information about line and column\nnumbers, so you can output more useful error messages. The input can also\ninclude single quotes, line comments, inline comments, dangling commas,\nunquoted single-word keys, and hexadecimal and octal numbers.\n\nThe goal of `dirtyjson` is to read JSON objects out of files that are\nlittered with elements that do not fit the official JSON standard. By providing\nline and column number contexts, a dirty JSON file can be used as source input\nfor a complex data parser or compiler.\n\n`dirtyjson` exposes an API familiar to users of the standard library\n`marshal` and `pickle` modules. However, `dirtyjson` provides\nonly the `load(s)` capability. To write JSON, use either the standard\n`json` library or `simplejson`.\n\n.. note::\n\n   The code for `dirtyjson` is a fairly drastically rewritten version\n   of the loader in `simplejson` so thanks go to Bob Ippolito of the\n   `simplejson` project for providing such a nice starting point.\n\nDevelopment of dirtyjson happens on Github:\nhttps://github.com/codecobblers/dirtyjson\n\nDecoding JSON and getting position information::\n\n    >>> import dirtyjson\n    >>> obj = [u'foo', {u'bar': [u'baz', None, 1.0, 2]}]\n    >>> d = dirtyjson.loads(\"\"\"[\"foo\", /* not fu*/ {bar: ['baz', null, 1.0, 2,]}] and then ignore this junk\"\"\")\n    >>> d == obj\n    True\n    >>> pos = d.attributes(0)  # line/column position of first element in array\n    >>> pos.line == 1\n    True\n    >>> pos.column == 2\n    True\n    >>> pos = d[1].attributes('bar')  # line/column position of 'bar' key/value pair\n    >>> pos.key.line == 1\n    True\n    >>> pos.key.column == 22\n    True\n    >>> pos.value.line == 1\n    True\n    >>> pos.value.column == 27\n    True\n\nDecoding unicode from JSON::\n\n    >>> dirtyjson.loads('\"\\\\\"foo\\\\bar\"') == u'\"foo\\x08ar'\n    True\n\nDecoding JSON from streams::\n\n    >>> from dirtyjson.compat import StringIO\n    >>> io = StringIO('[\"streaming API\"]')\n    >>> dirtyjson.load(io)[0] == 'streaming API'\n    True\n\nUsing Decimal instead of float::\n\n    >>> import dirtyjson\n    >>> from decimal import Decimal\n    >>> dirtyjson.loads('1.1', parse_float=Decimal) == Decimal('1.1')\n    True\n\n\nBasic Usage\n-----------\n\nload(fp[, encoding[, parse_float[, parse_int[, parse_constant[, search_for_first_object]]]]])\n\n   Performs the following translations in decoding by default:\n\n   +---------------+-------------------------+\n   | JSON          | Python                  |\n   +===============+=========================+\n   | object        | `AttributedDict`        |\n   +---------------+-------------------------+\n   | array         | `AttributedList`        |\n   +---------------+-------------------------+\n   | string        | unicode                 |\n   +---------------+-------------------------+\n   | number (int)  | int, long               |\n   +---------------+-------------------------+\n   | number (real) | float                   |\n   +---------------+-------------------------+\n   | true          | True                    |\n   +---------------+-------------------------+\n   | false         | False                   |\n   +---------------+-------------------------+\n   | null          | None                    |\n   +---------------+-------------------------+\n\n   It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as their\n   corresponding ``float`` values, which is outside the JSON spec.\n\n   Deserialize *fp* (a ``.read()``-supporting file-like object containing a JSON\n   document) to a Python object. `dirtyjson.Error` will be\n   raised if the given document is not valid.\n\n   If the contents of *fp* are encoded with an ASCII based encoding other than\n   UTF-8 (e.g. latin-1), then an appropriate *encoding* name must be specified.\n   Encodings that are not ASCII based (such as UCS-2) are not allowed, and\n   should be wrapped with ``codecs.getreader(fp)(encoding)``, or simply decoded\n   to a `unicode` object and passed to `loads`. The default\n   setting of ``'utf-8'`` is fastest and should be using whenever possible.\n\n   If *fp.read()* returns `str` then decoded JSON strings that contain\n   only ASCII characters may be parsed as `str` for performance and\n   memory reasons. If your code expects only `unicode` the appropriate\n   solution is to wrap fp with a reader as demonstrated above.\n\n   *parse_float*, if specified, will be called with the string of every JSON\n   float to be decoded. By default, this is equivalent to ``float(num_str)``.\n   This can be used to use another datatype or parser for JSON floats\n   (e.g. `decimal.Decimal`).\n\n   *parse_int*, if specified, will be called with the int of the string of every\n   JSON int to be decoded. By default, this is equivalent to ``int(num_str)``.\n   This can be used to use another datatype or parser for JSON integers\n   (e.g. `float`).\n\n   .. note::\n\n      Unlike the standard `json` module, `dirtyjson` always does\n      ``int(num_str, 0)`` before passing through to the converter passed is as\n      the *parse_int* parameter. This is to enable automatic handling of hex\n      and octal numbers.\n\n   *parse_constant*, if specified, will be called with one of the following\n   strings: ``true``, ``false``, ``null``, ``'-Infinity'``, ``'Infinity'``,\n   ``'NaN'``. This can be used to raise an exception if invalid JSON numbers are\n   encountered or to provide alternate values for any of these constants.\n\n   *search_for_first_object*, if ``True``, will cause the parser to search for\n   the first occurrence of either ``{`` or ``[``. This is very useful for\n   reading an object from a JavaScript file.\n\nloads(s[, encoding[, parse_float[, parse_int[, parse_constant[, search_for_first_object[, start_index]]]]])\n\n   Deserialize *s* (a `str` or `unicode` instance containing a JSON\n   document) to a Python object. `dirtyjson.Error` will be\n   raised if the given JSON document is not valid.\n\n   If *s* is a `str` instance and is encoded with an ASCII based encoding\n   other than UTF-8 (e.g. latin-1), then an appropriate *encoding* name must be\n   specified. Encodings that are not ASCII based (such as UCS-2) are not\n   allowed and should be decoded to `unicode` first.\n\n   If *s* is a `str` then decoded JSON strings that contain\n   only ASCII characters may be parsed as `str` for performance and\n   memory reasons. If your code expects only `unicode` the appropriate\n   solution is decode *s* to `unicode` prior to calling loads.\n\n   *start_index*, if non-zero, will cause the parser to start processing from\n   the specified offset, while maintaining the correct line and column numbers.\n   This is very useful for reading an object from the middle of a JavaScript\n   file.\n\n   The other arguments have the same meaning as in `load`.\n\nExceptions\n----------\n\ndirtyjson.Error(msg, doc, pos)\n\n    Subclass of `ValueError` with the following additional attributes:\n\n    msg\n\n        The unformatted error message\n\n    doc\n\n        The JSON document being parsed\n\n    pos\n\n        The start index of doc where parsing failed\n\n    lineno\n\n        The line corresponding to pos\n\n    colno\n\n        The column corresponding to pos\n\nAttributedDict and AttributedList\n---------------------------------\n\nThe `dirtyjson` module uses `AttributedDict` and\n`AttributedList` instead of ``dict`` and ``list``. Each is actually a\nsubclass of its base type (``dict`` or ``list``) and can be used as if they were\nthe standard class, but these have been enhanced to store attributes with each\nelement. We use those attributes to store line and column numbers. You can use\nthat information to refer users back to the exact location in the original\nsource file.\n\nPosition()\n\n   This is a very simple utility class that contains ``line`` and ``column``.\n   It is used for storing the position attributes for `AttributedList`\n   and `KeyValuePosition`\n\nKeyValuePosition()\n\n   This is another very simple utility class that contains ``key`` and\n   ``value``. Each of those is a `Position` object specifying the\n   location in the original source string/file of the key and value. It is used\n   for storing the position attributes for `AttributedDict`.\n\nAttributedDict()\n\n   A subclass of ``dict`` that behaves exactly like a ``dict`` except that it\n   maintains order like an ``OrderedDict`` and allows storing attributes for\n   each key/value pair.\n\n   add_with_attributes(self, key, value, attributes)\n\n      Set the *key* in the underlying ``dict`` to the *value* and also store\n      whatever is passed in as *attributes* for later retrieval. In our case,\n      we store `KeyValuePosition`.\n\n   attributes(self, key)\n\n      Return the attributes associated with the specified *key* or ``None`` if\n      no attributes exist for the key. In our case, we store\n      `KeyValuePosition`. Retrieve position info like this::\n\n         pos = d.attributes(key)\n         key_line = pos.key.line\n         key_column = pos.key.column\n         value_line = pos.value.line\n         value_column = pos.value.column\n\nAttributedList()\n\n   A subclass of ``list`` that behaves exactly like a ``list`` except that it\n   allows storing attributes for each value.\n\n   append(self, value, attributes=None):\n\n      Appends *value* to the list and *attributes* to the associated location.\n      In our case, we store `Position`.\n\n   attributes(self, index)\n\n      Returns the attributes for the value at the given *index*. In our case,\n      we store `Position`. Retrieve position info like this::\n\n         pos = l.attributes(index)\n         value_line = pos.line\n         value_column = pos.column\n\n   .. note::\n\n      This class is *NOT* robust. If you insert or delete items, the attributes\n      will get out of sync. Making this a non-naive class would be a nice\n      enhancement.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "JSON decoder for Python that can extract data from the muck",
    "version": "1.0.8",
    "project_urls": {
        "Homepage": "https://github.com/codecobblers/dirtyjson"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68691bcf70f81de1b4a9f21b3a62ec0c83bdff991c88d6cc2267d02408457e88",
                "md5": "55de55d9214499d4eb927231a654cecc",
                "sha256": "125e27248435a58acace26d5c2c4c11a1c0de0a9c5124c5a94ba78e517d74f53"
            },
            "downloads": -1,
            "filename": "dirtyjson-1.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "55de55d9214499d4eb927231a654cecc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25197,
            "upload_time": "2022-11-28T23:32:31",
            "upload_time_iso_8601": "2022-11-28T23:32:31.219093Z",
            "url": "https://files.pythonhosted.org/packages/68/69/1bcf70f81de1b4a9f21b3a62ec0c83bdff991c88d6cc2267d02408457e88/dirtyjson-1.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db04d24f6e645ad82ba0ef092fa17d9ef7a21953781663648a01c9371d9e8e98",
                "md5": "18612a6fe1af10444601ad0299654b20",
                "sha256": "90ca4a18f3ff30ce849d100dcf4a003953c79d3a2348ef056f1d9c22231a25fd"
            },
            "downloads": -1,
            "filename": "dirtyjson-1.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "18612a6fe1af10444601ad0299654b20",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 30782,
            "upload_time": "2022-11-28T23:32:33",
            "upload_time_iso_8601": "2022-11-28T23:32:33.319248Z",
            "url": "https://files.pythonhosted.org/packages/db/04/d24f6e645ad82ba0ef092fa17d9ef7a21953781663648a01c9371d9e8e98/dirtyjson-1.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-11-28 23:32:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "codecobblers",
    "github_project": "dirtyjson",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "dirtyjson"
}
        
Elapsed time: 0.06424s