html-to-json-enhanced


Namehtml-to-json-enhanced JSON
Version 1.0.5 PyPI version JSON
download
home_pagehttps://github.com/fhightower/html-to-json
SummaryConvert html to json.
upload_time2023-04-01 13:13:49
maintainer
docs_urlNone
authorMarvin Zhang
requires_python
licenseMIT License
keywords html to json html json conversion
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            # HTML to JSON

[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)
[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)
[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)

Convert HTML and/or HTML tables to JSON.

## Installation

```
pip install html-to-json
```

## Usage

### HTML to JSON

```python
import html_to_json_enhanced

html_string = """<head>
    <title>Test site</title>
    <meta charset="UTF-8"></head>"""
output_json = html_to_json_enhanced.convert(html_string)
print(output_json)
```

When calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.

#### Example

Example input:

```html
<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
```

Example output:

```json
{
    "head": [
    {
        "title": [
        {
            "_value": "Floyd Hightower's Projects"
        }],
        "meta": [
        {
            "_attributes":
            {
                "charset": "UTF-8"
            }
        },
        {
            "_attributes":
            {
                "name": "description",
                "content": "Floyd Hightower's Projects"
            }
        },
        {
            "_attributes":
            {
                "name": "keywords",
                "content": "projects,fhightower,Floyd,Hightower"
            }
        }]
    }]
}
```

### HTML Tables to JSON

In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.

Currently, this library can handle three types of tables:

A. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row
B. Those with table headers in the first column
C. Those without table headers

Tables of type A and B are diagrammed below:

![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)

#### Example

This code:

```python
import html_to_json_enhanced

html_string = """<table>
    <tr>
        <th>#</th>
        <th>Malware</th>
        <th>MD5</th>
        <th>Date Added</th>
    </tr>

    <tr>
        <td>25548</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
        <td>July 9, 2018, 6:25 a.m.</td>
    </tr>

    <tr>
        <td>25547</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
        <td>July 7, 2018, 6:25 a.m.</td>
    </tr></table>"""
tables = html_to_json_enhanced.convert_tables(html_string)
print(tables)
```

will produce this output:

```json
[
    [
        {
            "#": "25548",
            "Malware": "DarkComet",
            "MD5": "034a37b2a2307f876adc9538986d7b86",
            "Date Added": "July 9, 2018, 6:25 a.m."
        }, {
            "#": "25547",
            "Malware": "DarkComet",
            "MD5": "706eeefbac3de4d58b27d964173999c3",
            "Date Added": "July 7, 2018, 6:25 a.m."
        }
    ]
]
```

## Credits

This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fhightower/html-to-json",
    "name": "html-to-json-enhanced",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "html to json,html,json,conversion",
    "author": "Marvin Zhang",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/2b/1d/0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14/html-to-json-enhanced-1.0.5.tar.gz",
    "platform": null,
    "description": "# HTML to JSON\n\n[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)\n[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)\n[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)\n\nConvert HTML and/or HTML tables to JSON.\n\n## Installation\n\n```\npip install html-to-json\n```\n\n## Usage\n\n### HTML to JSON\n\n```python\nimport html_to_json_enhanced\n\nhtml_string = \"\"\"<head>\n    <title>Test site</title>\n    <meta charset=\"UTF-8\"></head>\"\"\"\noutput_json = html_to_json_enhanced.convert(html_string)\nprint(output_json)\n```\n\nWhen calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.\n\n#### Example\n\nExample input:\n\n```html\n<head>\n    <title>Floyd Hightower's Projects</title>\n    <meta charset=\"UTF-8\">\n    <meta name=\"description\" content=\"Floyd Hightower&#39;s Projects\">\n    <meta name=\"keywords\" content=\"projects,fhightower,Floyd,Hightower\">\n</head>\n```\n\nExample output:\n\n```json\n{\n    \"head\": [\n    {\n        \"title\": [\n        {\n            \"_value\": \"Floyd Hightower's Projects\"\n        }],\n        \"meta\": [\n        {\n            \"_attributes\":\n            {\n                \"charset\": \"UTF-8\"\n            }\n        },\n        {\n            \"_attributes\":\n            {\n                \"name\": \"description\",\n                \"content\": \"Floyd Hightower's Projects\"\n            }\n        },\n        {\n            \"_attributes\":\n            {\n                \"name\": \"keywords\",\n                \"content\": \"projects,fhightower,Floyd,Hightower\"\n            }\n        }]\n    }]\n}\n```\n\n### HTML Tables to JSON\n\nIn addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.\n\nCurrently, this library can handle three types of tables:\n\nA. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row\nB. Those with table headers in the first column\nC. Those without table headers\n\nTables of type A and B are diagrammed below:\n\n![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)\n\n#### Example\n\nThis code:\n\n```python\nimport html_to_json_enhanced\n\nhtml_string = \"\"\"<table>\n    <tr>\n        <th>#</th>\n        <th>Malware</th>\n        <th>MD5</th>\n        <th>Date Added</th>\n    </tr>\n\n    <tr>\n        <td>25548</td>\n        <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n        <td><a href=\"/config/034a37b2a2307f876adc9538986d7b86\">034a37b2a2307f876adc9538986d7b86</a></td>\n        <td>July 9, 2018, 6:25 a.m.</td>\n    </tr>\n\n    <tr>\n        <td>25547</td>\n        <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n        <td><a href=\"/config/706eeefbac3de4d58b27d964173999c3\">706eeefbac3de4d58b27d964173999c3</a></td>\n        <td>July 7, 2018, 6:25 a.m.</td>\n    </tr></table>\"\"\"\ntables = html_to_json_enhanced.convert_tables(html_string)\nprint(tables)\n```\n\nwill produce this output:\n\n```json\n[\n    [\n        {\n            \"#\": \"25548\",\n            \"Malware\": \"DarkComet\",\n            \"MD5\": \"034a37b2a2307f876adc9538986d7b86\",\n            \"Date Added\": \"July 9, 2018, 6:25 a.m.\"\n        }, {\n            \"#\": \"25547\",\n            \"Malware\": \"DarkComet\",\n            \"MD5\": \"706eeefbac3de4d58b27d964173999c3\",\n            \"Date Added\": \"July 7, 2018, 6:25 a.m.\"\n        }\n    ]\n]\n```\n\n## Credits\n\nThis package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Convert html to json.",
    "version": "1.0.5",
    "split_keywords": [
        "html to json",
        "html",
        "json",
        "conversion"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "336b52748ec261141b784fdea34b45e7b0a1e94bd106e71b1e0ae6ce9d9ecc5b",
                "md5": "06bec8aba880d72acee4b79d290ce021",
                "sha256": "8cd761912e65521f7904f55e32bd5ca965d0c9ef7117ee17e12e5429051201ce"
            },
            "downloads": -1,
            "filename": "html_to_json_enhanced-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "06bec8aba880d72acee4b79d290ce021",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8639,
            "upload_time": "2023-04-01T13:13:46",
            "upload_time_iso_8601": "2023-04-01T13:13:46.655355Z",
            "url": "https://files.pythonhosted.org/packages/33/6b/52748ec261141b784fdea34b45e7b0a1e94bd106e71b1e0ae6ce9d9ecc5b/html_to_json_enhanced-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b1d0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14",
                "md5": "437c4b56edfd79541370444261bb9bcf",
                "sha256": "3637dfdd6ae57977ce568a45777e4df970ac4482fdaec56466b66a04634ab662"
            },
            "downloads": -1,
            "filename": "html-to-json-enhanced-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "437c4b56edfd79541370444261bb9bcf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 28325,
            "upload_time": "2023-04-01T13:13:49",
            "upload_time_iso_8601": "2023-04-01T13:13:49.410370Z",
            "url": "https://files.pythonhosted.org/packages/2b/1d/0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14/html-to-json-enhanced-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-01 13:13:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "fhightower",
    "github_project": "html-to-json",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "html-to-json-enhanced"
}
        
Elapsed time: 0.07701s