html-to-json


Namehtml-to-json JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/fhightower/html-to-json
SummaryConvert html to json.
upload_time2021-02-27 17:34:50
maintainer
docs_urlNone
authorFloyd Hightower
requires_python
licenseMIT License
keywords html to json html json conversion
VCS
bugtrack_url
requirements bs4
Travis-CI
coveralls test coverage
            # HTML to JSON

[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)
[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)
[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)

Convert HTML and/or HTML tables to JSON.

## Installation

```
pip install html-to-json
```

## Usage

### HTML to JSON

```python
import html_to_json

html_string = """<head>
    <title>Test site</title>
    <meta charset="UTF-8"></head>"""
output_json = html_to_json.convert(html_string)
print(output_json)
```

When calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.

#### Example

Example input:

```html
<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
```

Example output:

```json
{
    "head": [
    {
        "title": [
        {
            "_value": "Floyd Hightower's Projects"
        }],
        "meta": [
        {
            "_attributes":
            {
                "charset": "UTF-8"
            }
        },
        {
            "_attributes":
            {
                "name": "description",
                "content": "Floyd Hightower's Projects"
            }
        },
        {
            "_attributes":
            {
                "name": "keywords",
                "content": "projects,fhightower,Floyd,Hightower"
            }
        }]
    }]
}
```

### HTML Tables to JSON

In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.

Currently, this library can handle three types of tables:

A. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row
B. Those with table headers in the first column
C. Those without table headers

Tables of type A and B are diagrammed below:

![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)

#### Example

This code:

```python
import html_to_json

html_string = """<table>
    <tr>
        <th>#</th>
        <th>Malware</th>
        <th>MD5</th>
        <th>Date Added</th>
    </tr>

    <tr>
        <td>25548</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
        <td>July 9, 2018, 6:25 a.m.</td>
    </tr>

    <tr>
        <td>25547</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
        <td>July 7, 2018, 6:25 a.m.</td>
    </tr></table>"""
tables = html_to_json.convert_tables(html_string)
print(tables)
```

will produce this output:

```json
[
    [
        {
            "#": "25548",
            "Malware": "DarkComet",
            "MD5": "034a37b2a2307f876adc9538986d7b86",
            "Date Added": "July 9, 2018, 6:25 a.m."
        }, {
            "#": "25547",
            "Malware": "DarkComet",
            "MD5": "706eeefbac3de4d58b27d964173999c3",
            "Date Added": "July 7, 2018, 6:25 a.m."
        }
    ]
]
```

## Credits

This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fhightower/html-to-json",
    "name": "html-to-json",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "html to json,html,json,conversion",
    "author": "Floyd Hightower",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/da/83/c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a/html_to_json-2.0.0.tar.gz",
    "platform": "",
    "description": "# HTML to JSON\n\n[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)\n[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)\n[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)\n\nConvert HTML and/or HTML tables to JSON.\n\n## Installation\n\n```\npip install html-to-json\n```\n\n## Usage\n\n### HTML to JSON\n\n```python\nimport html_to_json\n\nhtml_string = \"\"\"<head>\n    <title>Test site</title>\n    <meta charset=\"UTF-8\"></head>\"\"\"\noutput_json = html_to_json.convert(html_string)\nprint(output_json)\n```\n\nWhen calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.\n\n#### Example\n\nExample input:\n\n```html\n<head>\n    <title>Floyd Hightower's Projects</title>\n    <meta charset=\"UTF-8\">\n    <meta name=\"description\" content=\"Floyd Hightower&#39;s Projects\">\n    <meta name=\"keywords\" content=\"projects,fhightower,Floyd,Hightower\">\n</head>\n```\n\nExample output:\n\n```json\n{\n    \"head\": [\n    {\n        \"title\": [\n        {\n            \"_value\": \"Floyd Hightower's Projects\"\n        }],\n        \"meta\": [\n        {\n            \"_attributes\":\n            {\n                \"charset\": \"UTF-8\"\n            }\n        },\n        {\n            \"_attributes\":\n            {\n                \"name\": \"description\",\n                \"content\": \"Floyd Hightower's Projects\"\n            }\n        },\n        {\n            \"_attributes\":\n            {\n                \"name\": \"keywords\",\n                \"content\": \"projects,fhightower,Floyd,Hightower\"\n            }\n        }]\n    }]\n}\n```\n\n### HTML Tables to JSON\n\nIn addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.\n\nCurrently, this library can handle three types of tables:\n\nA. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row\nB. Those with table headers in the first column\nC. Those without table headers\n\nTables of type A and B are diagrammed below:\n\n![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)\n\n#### Example\n\nThis code:\n\n```python\nimport html_to_json\n\nhtml_string = \"\"\"<table>\n    <tr>\n        <th>#</th>\n        <th>Malware</th>\n        <th>MD5</th>\n        <th>Date Added</th>\n    </tr>\n\n    <tr>\n        <td>25548</td>\n        <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n        <td><a href=\"/config/034a37b2a2307f876adc9538986d7b86\">034a37b2a2307f876adc9538986d7b86</a></td>\n        <td>July 9, 2018, 6:25 a.m.</td>\n    </tr>\n\n    <tr>\n        <td>25547</td>\n        <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n        <td><a href=\"/config/706eeefbac3de4d58b27d964173999c3\">706eeefbac3de4d58b27d964173999c3</a></td>\n        <td>July 7, 2018, 6:25 a.m.</td>\n    </tr></table>\"\"\"\ntables = html_to_json.convert_tables(html_string)\nprint(tables)\n```\n\nwill produce this output:\n\n```json\n[\n    [\n        {\n            \"#\": \"25548\",\n            \"Malware\": \"DarkComet\",\n            \"MD5\": \"034a37b2a2307f876adc9538986d7b86\",\n            \"Date Added\": \"July 9, 2018, 6:25 a.m.\"\n        }, {\n            \"#\": \"25547\",\n            \"Malware\": \"DarkComet\",\n            \"MD5\": \"706eeefbac3de4d58b27d964173999c3\",\n            \"Date Added\": \"July 7, 2018, 6:25 a.m.\"\n        }\n    ]\n]\n```\n\n## Credits\n\nThis package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Convert html to json.",
    "version": "2.0.0",
    "project_urls": {
        "CI": "https://travis-ci.com/fhightower/html-to-json.svg?branch=main",
        "Changelog": "https://github.com/fhightower/html-to-json/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/fhightower/html-to-json",
        "Homepage": "https://github.com/fhightower/html-to-json",
        "PyPi": "https://pypi.org/project/html-to-json/",
        "Say Thanks!": "https://saythanks.io/to/floyd.hightower27%40gmail.com",
        "Source": "https://github.com/fhightower/html-to-json",
        "Tracker": "https://github.com/fhightower/html-to-json/issues"
    },
    "split_keywords": [
        "html to json",
        "html",
        "json",
        "conversion"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a79aa64abd13c010a02c3cc61f970295357fb0a65505eb096f7c03a2e7cdebd",
                "md5": "730212b353bec354b16c5249a66704c1",
                "sha256": "707ba86390ac05cf59d36a106f3d3da34b6075a245ee597d4c6c06ca9a6d0898"
            },
            "downloads": -1,
            "filename": "html_to_json-2.0.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "730212b353bec354b16c5249a66704c1",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 6440,
            "upload_time": "2021-02-27T17:34:49",
            "upload_time_iso_8601": "2021-02-27T17:34:49.757242Z",
            "url": "https://files.pythonhosted.org/packages/5a/79/aa64abd13c010a02c3cc61f970295357fb0a65505eb096f7c03a2e7cdebd/html_to_json-2.0.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da83c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a",
                "md5": "3435ba0c28a24aa9d273cc05799c91a7",
                "sha256": "3fc848f40618f444f8e9971f88a22fef041d0cb4569464de018dcf8e3c37669e"
            },
            "downloads": -1,
            "filename": "html_to_json-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3435ba0c28a24aa9d273cc05799c91a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 54197,
            "upload_time": "2021-02-27T17:34:50",
            "upload_time_iso_8601": "2021-02-27T17:34:50.824940Z",
            "url": "https://files.pythonhosted.org/packages/da/83/c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a/html_to_json-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-27 17:34:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fhightower",
    "github_project": "html-to-json",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "bs4",
            "specs": []
        }
    ],
    "lcname": "html-to-json"
}
        
Elapsed time: 0.06885s