# HTML FOR DOCX
Convert html to docx, this project is a fork from descontinued [pqzx/html2docx](https://github.com/pqzx/html2docx).
### How install
`pip install html-for-docx`
### Usage
The basic usage: Add HTML formatted to an existing Docx
```python
from html4docx import HtmlToDocx
parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)
```
You can use `python-docx` to manipulate the file as well, here an example
```python
from docx import Document
from html4docx import HtmlToDocx
document = Document()
new_parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)
document.save('your_file_name')
```
Convert files directly
```python
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)
```
Convert files from a string
```python
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)
```
Change table styles
Tables are not styled by default. Use the `table_style` attribute on the parser to set a table style. The style is used for all tables.
```python
from html4docx import HtmlToDocx
new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'
```
To add borders to tables, use the `TableGrid` style:
```python
new_parser.table_style = 'TableGrid'
```
Default table styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template
### Why
My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.
### Differences (fixes and new features)
**Fixes**
- Handle missing run for leading br tag | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/53)
- Fix base64 images | [djplaner](https://github.com/djplaner) from [Issue](https://github.com/pqzx/html2docx/issues/28#issuecomment-1052736896)
- Handle img tag without src attribute | [johnjor](https://github.com/johnjor) from [PR](https://github.com/pqzx/html2docx/pull/63)
- Fix bug when any style has `!important` | [Dfop02](https://github.com/dfop02)
- Fix 'style lookup by style_id is deprecated.' | [Dfop02](https://github.com/dfop02)
**New Features**
- Add Witdh/Height style to images | [maifeeulasad](https://github.com/maifeeulasad) from [PR](https://github.com/pqzx/html2docx/pull/29)
- Support px, cm, pt and % for style margin-left to paragraphs | [Dfop02](https://github.com/dfop02)
- Improve performance on large tables | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/58)
- Support for HTML Pagination | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)
- Support Table style | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)
- Support alternative encoding | [HebaElwazzan](https://github.com/HebaElwazzan) from [PR](https://github.com/pqzx/html2docx/pull/59)
- Refactory Tests to be more consistent and less 'human validation' | [Dfop02](https://github.com/dfop02)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
Raw data
{
"_id": null,
"home_page": "https://github.com/dfop02/html4docx",
"name": "html-for-docx",
"maintainer": "Diogo Fernandes",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "dfop02@hotmail.com",
"keywords": "html,docx,convert",
"author": "Diogo Fernandes",
"author_email": "dfop02@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/8d/f9/9ddd1b0f0382c0cd81513ee196a0dff1530b920b1075b279661dcf397478/html-for-docx-1.0.3.tar.gz",
"platform": "any",
"description": "# HTML FOR DOCX\nConvert html to docx, this project is a fork from descontinued [pqzx/html2docx](https://github.com/pqzx/html2docx).\n\n### How install\n\n`pip install html-for-docx`\n\n### Usage\n\nThe basic usage: Add HTML formatted to an existing Docx\n\n```python\nfrom html4docx import HtmlToDocx\n\nparser = HtmlToDocx()\nhtml_string = '<h1>Hello world</h1>'\nparser.add_html_to_document(html_string, filename_docx)\n```\n\nYou can use `python-docx` to manipulate the file as well, here an example\n\n```python\nfrom docx import Document\nfrom html4docx import HtmlToDocx\n\ndocument = Document()\nnew_parser = HtmlToDocx()\n\nhtml_string = '<h1>Hello world</h1>'\nnew_parser.add_html_to_document(html_string, document)\n\ndocument.save('your_file_name')\n```\n\nConvert files directly\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\nnew_parser.parse_html_file(input_html_file_path, output_docx_file_path)\n```\n\nConvert files from a string\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\ndocx = new_parser.parse_html_string(input_html_file_string)\n```\n\nChange table styles\n\nTables are not styled by default. Use the `table_style` attribute on the parser to set a table style. The style is used for all tables.\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\nnew_parser.table_style = 'Light Shading Accent 4'\n```\n\nTo add borders to tables, use the `TableGrid` style:\n\n```python\nnew_parser.table_style = 'TableGrid'\n```\n\nDefault table styles can be found\nhere: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template\n\n### Why\n\nMy goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.\n\n### Differences (fixes and new features)\n\n**Fixes**\n- Handle missing run for leading br tag | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/53)\n- Fix base64 images | [djplaner](https://github.com/djplaner) from [Issue](https://github.com/pqzx/html2docx/issues/28#issuecomment-1052736896)\n- Handle img tag without src attribute | [johnjor](https://github.com/johnjor) from [PR](https://github.com/pqzx/html2docx/pull/63)\n- Fix bug when any style has `!important` | [Dfop02](https://github.com/dfop02)\n- Fix 'style lookup by style_id is deprecated.' | [Dfop02](https://github.com/dfop02)\n\n**New Features**\n- Add Witdh/Height style to images | [maifeeulasad](https://github.com/maifeeulasad) from [PR](https://github.com/pqzx/html2docx/pull/29)\n- Support px, cm, pt and % for style margin-left to paragraphs | [Dfop02](https://github.com/dfop02)\n- Improve performance on large tables | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/58)\n- Support for HTML Pagination | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)\n- Support Table style | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)\n- Support alternative encoding | [HebaElwazzan](https://github.com/HebaElwazzan) from [PR](https://github.com/pqzx/html2docx/pull/59)\n- Refactory Tests to be more consistent and less 'human validation' | [Dfop02](https://github.com/dfop02)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Convert HTML to Docx easily and fastly",
"version": "1.0.3",
"project_urls": {
"Bug Tracker": "https://github.com/dfop02/html4docx/issues",
"Download": "https://github.com/dfop02/html4docx/archive/v1.0.3.tar.gz",
"Homepage": "https://github.com/dfop02/html4docx",
"Repository": "https://github.com/dfop02/html4docx"
},
"split_keywords": [
"html",
"docx",
"convert"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8df99ddd1b0f0382c0cd81513ee196a0dff1530b920b1075b279661dcf397478",
"md5": "dbda5c32dd4de7d97c1954cc46434c82",
"sha256": "a82015f40180eb24f4cfa110c2075f0e6ef54344c73a4c003d891f457105dcf9"
},
"downloads": -1,
"filename": "html-for-docx-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "dbda5c32dd4de7d97c1954cc46434c82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 17771,
"upload_time": "2024-02-27T14:24:13",
"upload_time_iso_8601": "2024-02-27T14:24:13.292856Z",
"url": "https://files.pythonhosted.org/packages/8d/f9/9ddd1b0f0382c0cd81513ee196a0dff1530b920b1075b279661dcf397478/html-for-docx-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-27 14:24:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dfop02",
"github_project": "html4docx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "html-for-docx"
}