# HTML to JSON
[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)
[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)
[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)
Convert HTML and/or HTML tables to JSON.
## Installation
```
pip install html-to-json
```
## Usage
### HTML to JSON
```python
import html_to_json
html_string = """<head>
<title>Test site</title>
<meta charset="UTF-8"></head>"""
output_json = html_to_json.convert(html_string)
print(output_json)
```
When calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.
#### Example
Example input:
```html
<head>
<title>Floyd Hightower's Projects</title>
<meta charset="UTF-8">
<meta name="description" content="Floyd Hightower's Projects">
<meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
```
Example output:
```json
{
"head": [
{
"title": [
{
"_value": "Floyd Hightower's Projects"
}],
"meta": [
{
"_attributes":
{
"charset": "UTF-8"
}
},
{
"_attributes":
{
"name": "description",
"content": "Floyd Hightower's Projects"
}
},
{
"_attributes":
{
"name": "keywords",
"content": "projects,fhightower,Floyd,Hightower"
}
}]
}]
}
```
### HTML Tables to JSON
In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.
Currently, this library can handle three types of tables:
A. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row
B. Those with table headers in the first column
C. Those without table headers
Tables of type A and B are diagrammed below:
![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)
#### Example
This code:
```python
import html_to_json
html_string = """<table>
<tr>
<th>#</th>
<th>Malware</th>
<th>MD5</th>
<th>Date Added</th>
</tr>
<tr>
<td>25548</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
<td>July 9, 2018, 6:25 a.m.</td>
</tr>
<tr>
<td>25547</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
<td>July 7, 2018, 6:25 a.m.</td>
</tr></table>"""
tables = html_to_json.convert_tables(html_string)
print(tables)
```
will produce this output:
```json
[
[
{
"#": "25548",
"Malware": "DarkComet",
"MD5": "034a37b2a2307f876adc9538986d7b86",
"Date Added": "July 9, 2018, 6:25 a.m."
}, {
"#": "25547",
"Malware": "DarkComet",
"MD5": "706eeefbac3de4d58b27d964173999c3",
"Date Added": "July 7, 2018, 6:25 a.m."
}
]
]
```
## Credits
This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).
Raw data
{
"_id": null,
"home_page": "https://github.com/fhightower/html-to-json",
"name": "html-to-json",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "html to json,html,json,conversion",
"author": "Floyd Hightower",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/da/83/c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a/html_to_json-2.0.0.tar.gz",
"platform": "",
"description": "# HTML to JSON\n\n[![PyPI](https://img.shields.io/pypi/v/html-to-json.svg)](https://pypi.python.org/pypi/html-to-json)\n[![Build Status](https://travis-ci.com/fhightower/html-to-json.svg?branch=main)](https://travis-ci.com/fhightower/html-to-json)\n[![codecov](https://codecov.io/gh/fhightower/html-to-json/branch/main/graph/badge.svg?token=V0WOIXRGMM)](https://codecov.io/gh/fhightower/html-to-json)\n\nConvert HTML and/or HTML tables to JSON.\n\n## Installation\n\n```\npip install html-to-json\n```\n\n## Usage\n\n### HTML to JSON\n\n```python\nimport html_to_json\n\nhtml_string = \"\"\"<head>\n <title>Test site</title>\n <meta charset=\"UTF-8\"></head>\"\"\"\noutput_json = html_to_json.convert(html_string)\nprint(output_json)\n```\n\nWhen calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.\n\n#### Example\n\nExample input:\n\n```html\n<head>\n <title>Floyd Hightower's Projects</title>\n <meta charset=\"UTF-8\">\n <meta name=\"description\" content=\"Floyd Hightower's Projects\">\n <meta name=\"keywords\" content=\"projects,fhightower,Floyd,Hightower\">\n</head>\n```\n\nExample output:\n\n```json\n{\n \"head\": [\n {\n \"title\": [\n {\n \"_value\": \"Floyd Hightower's Projects\"\n }],\n \"meta\": [\n {\n \"_attributes\":\n {\n \"charset\": \"UTF-8\"\n }\n },\n {\n \"_attributes\":\n {\n \"name\": \"description\",\n \"content\": \"Floyd Hightower's Projects\"\n }\n },\n {\n \"_attributes\":\n {\n \"name\": \"keywords\",\n \"content\": \"projects,fhightower,Floyd,Hightower\"\n }\n }]\n }]\n}\n```\n\n### HTML Tables to JSON\n\nIn addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.\n\nCurrently, this library can handle three types of tables:\n\nA. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row\nB. Those with table headers in the first column\nC. Those without table headers\n\nTables of type A and B are diagrammed below:\n\n![This package can handle tables with the headers in the first row or headers in the first column](./html_table_varieties.jpg)\n\n#### Example\n\nThis code:\n\n```python\nimport html_to_json\n\nhtml_string = \"\"\"<table>\n <tr>\n <th>#</th>\n <th>Malware</th>\n <th>MD5</th>\n <th>Date Added</th>\n </tr>\n\n <tr>\n <td>25548</td>\n <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n <td><a href=\"/config/034a37b2a2307f876adc9538986d7b86\">034a37b2a2307f876adc9538986d7b86</a></td>\n <td>July 9, 2018, 6:25 a.m.</td>\n </tr>\n\n <tr>\n <td>25547</td>\n <td><a href=\"/stats/DarkComet/\">DarkComet</a></td>\n <td><a href=\"/config/706eeefbac3de4d58b27d964173999c3\">706eeefbac3de4d58b27d964173999c3</a></td>\n <td>July 7, 2018, 6:25 a.m.</td>\n </tr></table>\"\"\"\ntables = html_to_json.convert_tables(html_string)\nprint(tables)\n```\n\nwill produce this output:\n\n```json\n[\n [\n {\n \"#\": \"25548\",\n \"Malware\": \"DarkComet\",\n \"MD5\": \"034a37b2a2307f876adc9538986d7b86\",\n \"Date Added\": \"July 9, 2018, 6:25 a.m.\"\n }, {\n \"#\": \"25547\",\n \"Malware\": \"DarkComet\",\n \"MD5\": \"706eeefbac3de4d58b27d964173999c3\",\n \"Date Added\": \"July 7, 2018, 6:25 a.m.\"\n }\n ]\n]\n```\n\n## Credits\n\nThis package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).\n\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Convert html to json.",
"version": "2.0.0",
"project_urls": {
"CI": "https://travis-ci.com/fhightower/html-to-json.svg?branch=main",
"Changelog": "https://github.com/fhightower/html-to-json/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/fhightower/html-to-json",
"Homepage": "https://github.com/fhightower/html-to-json",
"PyPi": "https://pypi.org/project/html-to-json/",
"Say Thanks!": "https://saythanks.io/to/floyd.hightower27%40gmail.com",
"Source": "https://github.com/fhightower/html-to-json",
"Tracker": "https://github.com/fhightower/html-to-json/issues"
},
"split_keywords": [
"html to json",
"html",
"json",
"conversion"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a79aa64abd13c010a02c3cc61f970295357fb0a65505eb096f7c03a2e7cdebd",
"md5": "730212b353bec354b16c5249a66704c1",
"sha256": "707ba86390ac05cf59d36a106f3d3da34b6075a245ee597d4c6c06ca9a6d0898"
},
"downloads": -1,
"filename": "html_to_json-2.0.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "730212b353bec354b16c5249a66704c1",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 6440,
"upload_time": "2021-02-27T17:34:49",
"upload_time_iso_8601": "2021-02-27T17:34:49.757242Z",
"url": "https://files.pythonhosted.org/packages/5a/79/aa64abd13c010a02c3cc61f970295357fb0a65505eb096f7c03a2e7cdebd/html_to_json-2.0.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "da83c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a",
"md5": "3435ba0c28a24aa9d273cc05799c91a7",
"sha256": "3fc848f40618f444f8e9971f88a22fef041d0cb4569464de018dcf8e3c37669e"
},
"downloads": -1,
"filename": "html_to_json-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "3435ba0c28a24aa9d273cc05799c91a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 54197,
"upload_time": "2021-02-27T17:34:50",
"upload_time_iso_8601": "2021-02-27T17:34:50.824940Z",
"url": "https://files.pythonhosted.org/packages/da/83/c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a/html_to_json-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-02-27 17:34:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fhightower",
"github_project": "html-to-json",
"travis_ci": true,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "bs4",
"specs": []
}
],
"lcname": "html-to-json"
}