# HTML to JSON
[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)
[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)
[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)
Convert HTML and/or HTML tables to JSON.
## Installation
```
pip install html-to-json
```
## Usage
### HTML to JSON
```python
import html_to_json_enhanced
html_string = """<head>
<title>Test site</title>
<meta charset="UTF-8"></head>"""
output_json = html_to_json_enhanced.convert(html_string)
print(output_json)
```
When calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.
#### Example
Example input:
```html
<head>
<title>Floyd Hightower's Projects</title>
<meta charset="UTF-8">
<meta name="description" content="Floyd Hightower's Projects">
<meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
```
Example output:
```json
{
"head": [
{
"title": [
{
"_value": "Floyd Hightower's Projects"
}],
"meta": [
{
"_attributes":
{
"charset": "UTF-8"
}
},
{
"_attributes":
{
"name": "description",
"content": "Floyd Hightower's Projects"
}
},
{
"_attributes":
{
"name": "keywords",
"content": "projects,fhightower,Floyd,Hightower"
}
}]
}]
}
```
### HTML Tables to JSON
In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.
Currently, this library can handle three types of tables:
A. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row
B. Those with table headers in the first column
C. Those without table headers
Tables of type A and B are diagrammed below:
![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)
#### Example
This code:
```python
import html_to_json_enhanced
html_string = """<table>
<tr>
<th>#</th>
<th>Malware</th>
<th>MD5</th>
<th>Date Added</th>
</tr>
<tr>
<td>25548</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
<td>July 9, 2018, 6:25 a.m.</td>
</tr>
<tr>
<td>25547</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
<td>July 7, 2018, 6:25 a.m.</td>
</tr></table>"""
tables = html_to_json_enhanced.convert_tables(html_string)
print(tables)
```
will produce this output:
```json
[
[
{
"#": "25548",
"Malware": "DarkComet",
"MD5": "034a37b2a2307f876adc9538986d7b86",
"Date Added": "July 9, 2018, 6:25 a.m."
}, {
"#": "25547",
"Malware": "DarkComet",
"MD5": "706eeefbac3de4d58b27d964173999c3",
"Date Added": "July 7, 2018, 6:25 a.m."
}
]
]
```
## Credits
This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).
Raw data
{
"_id": null,
"home_page": "https://github.com/fhightower/html-to-json",
"name": "html-to-json-enhanced",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "html to json,html,json,conversion",
"author": "Marvin Zhang",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/2b/1d/0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14/html-to-json-enhanced-1.0.5.tar.gz",
"platform": null,
"description": "# HTML to JSON\n\n[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)\n[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)\n[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)\n\nConvert HTML and/or HTML tables to JSON.\n\n## Installation\n\n```\npip install html-to-json\n```\n\n## Usage\n\n### HTML to JSON\n\n```python\nimport html_to_json_enhanced\n\nhtml_string = \"\"\"<head>\n <title>Test site</title>\n <meta charset=\"UTF-8\"></head>\"\"\"\noutput_json = html_to_json_enhanced.convert(html_string)\nprint(output_json)\n```\n\nWhen calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.\n\n#### Example\n\nExample input:\n\n```html\n<head>\n <title>Floyd Hightower's Projects</title>\n <meta charset=\"UTF-8\">\n <meta name=\"description\" content=\"Floyd Hightower's Projects\">\n <meta name=\"keywords\" content=\"projects,fhightower,Floyd,Hightower\">\n</head>\n```\n\nExample output:\n\n```json\n{\n \"head\": [\n {\n \"title\": [\n {\n \"_value\": \"Floyd Hightower's Projects\"\n }],\n \"meta\": [\n {\n \"_attributes\":\n {\n \"charset\": \"UTF-8\"\n }\n },\n {\n \"_attributes\":\n {\n \"name\": \"description\",\n \"content\": \"Floyd Hightower's Projects\"\n }\n },\n {\n \"_attributes\":\n {\n \"name\": \"keywords\",\n \"content\": \"projects,fhightower,Floyd,Hightower\"\n }\n }]\n }]\n}\n```\n\n### HTML Tables to JSON\n\nIn addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.\n\nCurrently, this library can handle three types of tables:\n\nA. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row\nB. Those with table headers in the first column\nC. Those without table headers\n\nTables of type A and B are diagrammed below:\n\n![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)\n\n#### Example\n\nThis code:\n\n```python\nimport html_to_json_enhanced\n\nhtml_string = \"\"\"<table>\n <tr>\n <th>#</th>\n <th>Malware</th>\n <th>MD5</th>\n <th>Date Added</th>\n </tr>\n\n <tr>\n <td>25548</td>\n <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n <td><a href=\"/config/034a37b2a2307f876adc9538986d7b86\">034a37b2a2307f876adc9538986d7b86</a></td>\n <td>July 9, 2018, 6:25 a.m.</td>\n </tr>\n\n <tr>\n <td>25547</td>\n <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n <td><a href=\"/config/706eeefbac3de4d58b27d964173999c3\">706eeefbac3de4d58b27d964173999c3</a></td>\n <td>July 7, 2018, 6:25 a.m.</td>\n </tr></table>\"\"\"\ntables = html_to_json_enhanced.convert_tables(html_string)\nprint(tables)\n```\n\nwill produce this output:\n\n```json\n[\n [\n {\n \"#\": \"25548\",\n \"Malware\": \"DarkComet\",\n \"MD5\": \"034a37b2a2307f876adc9538986d7b86\",\n \"Date Added\": \"July 9, 2018, 6:25 a.m.\"\n }, {\n \"#\": \"25547\",\n \"Malware\": \"DarkComet\",\n \"MD5\": \"706eeefbac3de4d58b27d964173999c3\",\n \"Date Added\": \"July 7, 2018, 6:25 a.m.\"\n }\n ]\n]\n```\n\n## Credits\n\nThis package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Convert html to json.",
"version": "1.0.5",
"split_keywords": [
"html to json",
"html",
"json",
"conversion"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "336b52748ec261141b784fdea34b45e7b0a1e94bd106e71b1e0ae6ce9d9ecc5b",
"md5": "06bec8aba880d72acee4b79d290ce021",
"sha256": "8cd761912e65521f7904f55e32bd5ca965d0c9ef7117ee17e12e5429051201ce"
},
"downloads": -1,
"filename": "html_to_json_enhanced-1.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "06bec8aba880d72acee4b79d290ce021",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8639,
"upload_time": "2023-04-01T13:13:46",
"upload_time_iso_8601": "2023-04-01T13:13:46.655355Z",
"url": "https://files.pythonhosted.org/packages/33/6b/52748ec261141b784fdea34b45e7b0a1e94bd106e71b1e0ae6ce9d9ecc5b/html_to_json_enhanced-1.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b1d0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14",
"md5": "437c4b56edfd79541370444261bb9bcf",
"sha256": "3637dfdd6ae57977ce568a45777e4df970ac4482fdaec56466b66a04634ab662"
},
"downloads": -1,
"filename": "html-to-json-enhanced-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "437c4b56edfd79541370444261bb9bcf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28325,
"upload_time": "2023-04-01T13:13:49",
"upload_time_iso_8601": "2023-04-01T13:13:49.410370Z",
"url": "https://files.pythonhosted.org/packages/2b/1d/0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14/html-to-json-enhanced-1.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-01 13:13:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "fhightower",
"github_project": "html-to-json",
"travis_ci": true,
"coveralls": true,
"github_actions": true,
"requirements": [],
"lcname": "html-to-json-enhanced"
}