fix-busted-json


Namefix-busted-json JSON
Version 0.0.18 PyPI version JSON
download
home_pageNone
SummaryFixes broken JSON string objects
upload_time2024-04-22 08:26:37
maintainerNone
docs_urlNone
authorTim Buckland
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fix-busted-json

Fix broken json using Python.

For Python 3.6+.

This project fixes broken JSON with the following issues:

-   Missing quotes around key names
-   Wrong quotes around key names and strings
    -   Single quotes
    -   Backticks
    -   Escaped double quote
    -   Double escaped double quote
    -   "Smart" i.e. curly quotes
-   Missing commas between key-value pairs and array elements
-   Trailing comma after last key-value pair
-   Concatenation of string fields
-   Replace Python True/False/None with JSON true/false/null
-   Remove additional double quote at start of key that gpt-3.5-turbo sometimes adds
-   Escape unescaped newline `\n` in string value
-   Deal with many escaping la-la land cases e.g. `{\"res\": \"{ \\\"a\\\": \\\"b\\\" }\"}`

Utility functions are also provided for finding JSON objects in text.

https://github.com/Qarj/fix-busted-json

https://pypi.org/project/fix-busted-json

## Quickstart

```sh
pip install fix-busted-json
```

Make a file called `example_repair_json.py`:

```py
#!/usr/bin/env python3

from fix_busted_json import repair_json

invalid_json = "{ name: 'John' 'age': 30, 'city': 'New' + ' York', }"

fixed_json = repair_json(invalid_json)

print(fixed_json)
```

Note the issues in the invalid JSON:

-   name is unquoted
-   use of single quotes, JSON spec requires double quotes
-   Missing comma
-   Concatenation of string fields - not allowed in JSON
-   Trailing comma

Run it:

```sh
python example_repair_json.py
```

Output:

```json
{ "name": "John", "age": 30, "city": "New York" }
```

## Why

The project was developed originally to find JSON like objects in log files and pretty print them.

More recently this project has been used to find and then fix broken JSON created by large language models such as `gpt-3.5-turbo` and `gpt-4`.

For example a large language model might output a completion like the following:

```txt
Thought: "I need to search for developer jobs in London"
Action: SearchTool
ActionInput: { location: "London", 'title': "developer" }
```

To get back this JSON object with this project is really easy:

```py
#!/usr/bin/env python3

from fix_busted_json import first_json

completion = """Thought: "I need to search for developer jobs in London"
Action: SearchTool
ActionInput: { location: "London", 'title': "developer" }
"""

print(first_json(completion))
```

Output:

```json
{ "location": "London", "title": "developer" }
```

## API

### `repair_json`

```py
#!/usr/bin/env python3

from fix_busted_json import repair_json

invalid_json = "{ name: 'John' }"

fixed_json = repair_json(invalid_json)
```

### log_jsons

Looks for JSON objects in text and logs them, also recursively logging any JSON objects found in the values of the top-level JSON object.

```py
#!/usr/bin/env python3

from fix_busted_json import log_jsons

log_jsons("""some text { key1: true, 'key2': "  { inner: 'value', } " } text { a: 1 } text""")
```

Running it gives output:

```txt
some text
{
  "key1": true,
  "key2": "  { inner: 'value', } "
}

FOUND JSON found in key key2 --->

{
  "inner": "value"
}


 text
{
  "a": 1
}
 text
```

### to_array_of_plain_strings_or_json

Breaks text into an array of plain strings and JSON objects.

```py
#!/usr/bin/env python3

from fix_busted_json import to_array_of_plain_strings_or_json

result = to_array_of_plain_strings_or_json("""some text { key1: true, 'key2': "  { inner: 'value', } " } text { a: 1 } text""")

print(result)
```

Gives output:

```txt
['some text ', '{ "key1": true, "key2": "  { inner: \'value\', } " }', ' text ', '{ "a": 1 }', ' text']
```

### first_json, last_json, largest_json, json_matching

Utility functions for finding JSON objects in text.

```py
#!/usr/bin/env python3
import re
from fix_busted_json import first_json, last_json, largest_json, json_matching

jsons = "text { first: 123 } etc { second_example: 456 } etc { third: 789 } { fourth: 12 }"

print(first_json(jsons))
print(last_json(jsons))
print(largest_json(jsons))
print(json_matching(jsons, re.compile("thi")))
```

Output:

```txt
{ "first": 123 }
{ "fourth": 12 }
{ "second_example": 456 }
{ "third": 789 }
```

## See also

Node version of this project: https://www.npmjs.com/package/log-parsed-json

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fix-busted-json",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Tim Buckland",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7d/20/378d67dd0246f8d4f34902ac9a65dd81c627f77d6cea65cb21a4c34379ec/fix-busted-json-0.0.18.tar.gz",
    "platform": null,
    "description": "# fix-busted-json\n\nFix broken json using Python.\n\nFor Python 3.6+.\n\nThis project fixes broken JSON with the following issues:\n\n-   Missing quotes around key names\n-   Wrong quotes around key names and strings\n    -   Single quotes\n    -   Backticks\n    -   Escaped double quote\n    -   Double escaped double quote\n    -   \"Smart\" i.e. curly quotes\n-   Missing commas between key-value pairs and array elements\n-   Trailing comma after last key-value pair\n-   Concatenation of string fields\n-   Replace Python True/False/None with JSON true/false/null\n-   Remove additional double quote at start of key that gpt-3.5-turbo sometimes adds\n-   Escape unescaped newline `\\n` in string value\n-   Deal with many escaping la-la land cases e.g. `{\\\"res\\\": \\\"{ \\\\\\\"a\\\\\\\": \\\\\\\"b\\\\\\\" }\\\"}`\n\nUtility functions are also provided for finding JSON objects in text.\n\nhttps://github.com/Qarj/fix-busted-json\n\nhttps://pypi.org/project/fix-busted-json\n\n## Quickstart\n\n```sh\npip install fix-busted-json\n```\n\nMake a file called `example_repair_json.py`:\n\n```py\n#!/usr/bin/env python3\n\nfrom fix_busted_json import repair_json\n\ninvalid_json = \"{ name: 'John' 'age': 30, 'city': 'New' + ' York', }\"\n\nfixed_json = repair_json(invalid_json)\n\nprint(fixed_json)\n```\n\nNote the issues in the invalid JSON:\n\n-   name is unquoted\n-   use of single quotes, JSON spec requires double quotes\n-   Missing comma\n-   Concatenation of string fields - not allowed in JSON\n-   Trailing comma\n\nRun it:\n\n```sh\npython example_repair_json.py\n```\n\nOutput:\n\n```json\n{ \"name\": \"John\", \"age\": 30, \"city\": \"New York\" }\n```\n\n## Why\n\nThe project was developed originally to find JSON like objects in log files and pretty print them.\n\nMore recently this project has been used to find and then fix broken JSON created by large language models such as `gpt-3.5-turbo` and `gpt-4`.\n\nFor example a large language model might output a completion like the following:\n\n```txt\nThought: \"I need to search for developer jobs in London\"\nAction: SearchTool\nActionInput: { location: \"London\", 'title': \"developer\" }\n```\n\nTo get back this JSON object with this project is really easy:\n\n```py\n#!/usr/bin/env python3\n\nfrom fix_busted_json import first_json\n\ncompletion = \"\"\"Thought: \"I need to search for developer jobs in London\"\nAction: SearchTool\nActionInput: { location: \"London\", 'title': \"developer\" }\n\"\"\"\n\nprint(first_json(completion))\n```\n\nOutput:\n\n```json\n{ \"location\": \"London\", \"title\": \"developer\" }\n```\n\n## API\n\n### `repair_json`\n\n```py\n#!/usr/bin/env python3\n\nfrom fix_busted_json import repair_json\n\ninvalid_json = \"{ name: 'John' }\"\n\nfixed_json = repair_json(invalid_json)\n```\n\n### log_jsons\n\nLooks for JSON objects in text and logs them, also recursively logging any JSON objects found in the values of the top-level JSON object.\n\n```py\n#!/usr/bin/env python3\n\nfrom fix_busted_json import log_jsons\n\nlog_jsons(\"\"\"some text { key1: true, 'key2': \"  { inner: 'value', } \" } text { a: 1 } text\"\"\")\n```\n\nRunning it gives output:\n\n```txt\nsome text\n{\n  \"key1\": true,\n  \"key2\": \"  { inner: 'value', } \"\n}\n\nFOUND JSON found in key key2 --->\n\n{\n  \"inner\": \"value\"\n}\n\n\n text\n{\n  \"a\": 1\n}\n text\n```\n\n### to_array_of_plain_strings_or_json\n\nBreaks text into an array of plain strings and JSON objects.\n\n```py\n#!/usr/bin/env python3\n\nfrom fix_busted_json import to_array_of_plain_strings_or_json\n\nresult = to_array_of_plain_strings_or_json(\"\"\"some text { key1: true, 'key2': \"  { inner: 'value', } \" } text { a: 1 } text\"\"\")\n\nprint(result)\n```\n\nGives output:\n\n```txt\n['some text ', '{ \"key1\": true, \"key2\": \"  { inner: \\'value\\', } \" }', ' text ', '{ \"a\": 1 }', ' text']\n```\n\n### first_json, last_json, largest_json, json_matching\n\nUtility functions for finding JSON objects in text.\n\n```py\n#!/usr/bin/env python3\nimport re\nfrom fix_busted_json import first_json, last_json, largest_json, json_matching\n\njsons = \"text { first: 123 } etc { second_example: 456 } etc { third: 789 } { fourth: 12 }\"\n\nprint(first_json(jsons))\nprint(last_json(jsons))\nprint(largest_json(jsons))\nprint(json_matching(jsons, re.compile(\"thi\")))\n```\n\nOutput:\n\n```txt\n{ \"first\": 123 }\n{ \"fourth\": 12 }\n{ \"second_example\": 456 }\n{ \"third\": 789 }\n```\n\n## See also\n\nNode version of this project: https://www.npmjs.com/package/log-parsed-json\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Fixes broken JSON string objects",
    "version": "0.0.18",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4860dd88b9688821079e92a0ed015779f11a65576218d525948be3148b81b86e",
                "md5": "54a53fbab30b27625a6699e666ce1154",
                "sha256": "fdce0e02c9a810b3aa28e1c3c32c24b21b44e89f6315ec25d2b963bd52a6ef03"
            },
            "downloads": -1,
            "filename": "fix_busted_json-0.0.18-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "54a53fbab30b27625a6699e666ce1154",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 7358,
            "upload_time": "2024-04-22T08:26:35",
            "upload_time_iso_8601": "2024-04-22T08:26:35.946069Z",
            "url": "https://files.pythonhosted.org/packages/48/60/dd88b9688821079e92a0ed015779f11a65576218d525948be3148b81b86e/fix_busted_json-0.0.18-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d20378d67dd0246f8d4f34902ac9a65dd81c627f77d6cea65cb21a4c34379ec",
                "md5": "1e9a76a6b7086ecfe72769415191f260",
                "sha256": "93c5dab7cae3b5d0b055f2c7043f9fe727a88a80d0be753c5f2c20bb9b69672f"
            },
            "downloads": -1,
            "filename": "fix-busted-json-0.0.18.tar.gz",
            "has_sig": false,
            "md5_digest": "1e9a76a6b7086ecfe72769415191f260",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10491,
            "upload_time": "2024-04-22T08:26:37",
            "upload_time_iso_8601": "2024-04-22T08:26:37.341296Z",
            "url": "https://files.pythonhosted.org/packages/7d/20/378d67dd0246f8d4f34902ac9a65dd81c627f77d6cea65cb21a4c34379ec/fix-busted-json-0.0.18.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 08:26:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fix-busted-json"
}
        
Elapsed time: 0.33605s