html-for-docx


Namehtml-for-docx JSON
Version 1.0.4 PyPI version JSON
download
home_pagehttps://github.com/dfop02/html4docx
SummaryConvert HTML to Docx easily and fastly
upload_time2024-08-07 00:52:38
maintainerDiogo Fernandes
docs_urlNone
authorDiogo Fernandes
requires_python>=3.7
licenseMIT
keywords html docx office word convert transform
VCS
bugtrack_url
requirements beautifulsoup4 python-docx
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HTML FOR DOCX
Convert html to docx, this project is a fork from descontinued [pqzx/html2docx](https://github.com/pqzx/html2docx).

### How install

`pip install html-for-docx`

### Usage

The basic usage: Add HTML formatted to an existing Docx

```python
from html4docx import HtmlToDocx

parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)
```

You can use `python-docx` to manipulate the file as well, here an example

```python
from docx import Document
from html4docx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)

document.save('your_file_name')
```

Convert files directly

```python
from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)
```

Convert files from a string

```python
from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)
```

Change table styles

Tables are not styled by default. Use the `table_style` attribute on the parser to set a table style. The style is used for all tables.

```python
from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'
```

To add borders to tables, use the `TableGrid` style:

```python
new_parser.table_style = 'TableGrid'
```

Default table styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template

### Why

My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.

### Differences (fixes and new features)

**Fixes**
- Handle missing run for leading br tag | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/53)
- Fix base64 images | [djplaner](https://github.com/djplaner) from [Issue](https://github.com/pqzx/html2docx/issues/28#issuecomment-1052736896)
- Handle img tag without src attribute | [johnjor](https://github.com/johnjor) from [PR](https://github.com/pqzx/html2docx/pull/63)
- Fix bug when any style has `!important` | [Dfop02](https://github.com/dfop02)
- Fix 'style lookup by style_id is deprecated.' | [Dfop02](https://github.com/dfop02)

**New Features**
- Add Witdh/Height style to images | [maifeeulasad](https://github.com/maifeeulasad) from [PR](https://github.com/pqzx/html2docx/pull/29)
- Support px, cm, pt and % for style margin-left to paragraphs | [Dfop02](https://github.com/dfop02)
- Improve performance on large tables | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/58)
- Support for HTML Pagination | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)
- Support Table style | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)
- Support alternative encoding | [HebaElwazzan](https://github.com/HebaElwazzan) from [PR](https://github.com/pqzx/html2docx/pull/59)
- Support colors by name | [Dfop02](https://github.com/dfop02)
- Support font_size when text, ex.: small, medium, etc. | [Dfop02](https://github.com/dfop02)
- Support to internal links (Anchor) | [Dfop02](https://github.com/dfop02)
- Refactory Tests to be more consistent and less 'human validation' | [Dfop02](https://github.com/dfop02)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dfop02/html4docx",
    "name": "html-for-docx",
    "maintainer": "Diogo Fernandes",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "dfop02@hotmail.com",
    "keywords": "html, docx, office, word, convert, transform",
    "author": "Diogo Fernandes",
    "author_email": "diogofernandesop@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/af/ad/62c2b12aa48426b0ec8bc6718cd246bf581118953d0beaa9bf340e157884/html_for_docx-1.0.4.tar.gz",
    "platform": "any",
    "description": "# HTML FOR DOCX\nConvert html to docx, this project is a fork from descontinued [pqzx/html2docx](https://github.com/pqzx/html2docx).\n\n### How install\n\n`pip install html-for-docx`\n\n### Usage\n\nThe basic usage: Add HTML formatted to an existing Docx\n\n```python\nfrom html4docx import HtmlToDocx\n\nparser = HtmlToDocx()\nhtml_string = '<h1>Hello world</h1>'\nparser.add_html_to_document(html_string, filename_docx)\n```\n\nYou can use `python-docx` to manipulate the file as well, here an example\n\n```python\nfrom docx import Document\nfrom html4docx import HtmlToDocx\n\ndocument = Document()\nnew_parser = HtmlToDocx()\n\nhtml_string = '<h1>Hello world</h1>'\nnew_parser.add_html_to_document(html_string, document)\n\ndocument.save('your_file_name')\n```\n\nConvert files directly\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\nnew_parser.parse_html_file(input_html_file_path, output_docx_file_path)\n```\n\nConvert files from a string\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\ndocx = new_parser.parse_html_string(input_html_file_string)\n```\n\nChange table styles\n\nTables are not styled by default. Use the `table_style` attribute on the parser to set a table style. The style is used for all tables.\n\n```python\nfrom html4docx import HtmlToDocx\n\nnew_parser = HtmlToDocx()\nnew_parser.table_style = 'Light Shading Accent 4'\n```\n\nTo add borders to tables, use the `TableGrid` style:\n\n```python\nnew_parser.table_style = 'TableGrid'\n```\n\nDefault table styles can be found\nhere: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template\n\n### Why\n\nMy goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.\n\n### Differences (fixes and new features)\n\n**Fixes**\n- Handle missing run for leading br tag | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/53)\n- Fix base64 images | [djplaner](https://github.com/djplaner) from [Issue](https://github.com/pqzx/html2docx/issues/28#issuecomment-1052736896)\n- Handle img tag without src attribute | [johnjor](https://github.com/johnjor) from [PR](https://github.com/pqzx/html2docx/pull/63)\n- Fix bug when any style has `!important` | [Dfop02](https://github.com/dfop02)\n- Fix 'style lookup by style_id is deprecated.' | [Dfop02](https://github.com/dfop02)\n\n**New Features**\n- Add Witdh/Height style to images | [maifeeulasad](https://github.com/maifeeulasad) from [PR](https://github.com/pqzx/html2docx/pull/29)\n- Support px, cm, pt and % for style margin-left to paragraphs | [Dfop02](https://github.com/dfop02)\n- Improve performance on large tables | [dashingdove](https://github.com/dashingdove) from [PR](https://github.com/pqzx/html2docx/pull/58)\n- Support for HTML Pagination | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)\n- Support Table style | [Evilran](https://github.com/Evilran) from [PR](https://github.com/pqzx/html2docx/pull/39)\n- Support alternative encoding | [HebaElwazzan](https://github.com/HebaElwazzan) from [PR](https://github.com/pqzx/html2docx/pull/59)\n- Support colors by name | [Dfop02](https://github.com/dfop02)\n- Support font_size when text, ex.: small, medium, etc. | [Dfop02](https://github.com/dfop02)\n- Support to internal links (Anchor) | [Dfop02](https://github.com/dfop02)\n- Refactory Tests to be more consistent and less 'human validation' | [Dfop02](https://github.com/dfop02)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Convert HTML to Docx easily and fastly",
    "version": "1.0.4",
    "project_urls": {
        "Bug Tracker": "https://github.com/dfop02/html4docx/issues",
        "Changelog": "https://github.com/dfop02/html4docx/blob/master/HISTORY.rst",
        "Download": "https://github.com/dfop02/html4docx/archive/1.0.4.tar.gz",
        "Homepage": "https://github.com/dfop02/html4docx",
        "Repository": "https://github.com/dfop02/html4docx"
    },
    "split_keywords": [
        "html",
        " docx",
        " office",
        " word",
        " convert",
        " transform"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cdb5d6887d485dd8d480652eb884c903ebd82a5805422ed7fe3d146259b0c45c",
                "md5": "dfd5e25c99ae31d8ad22ecf87d30fffa",
                "sha256": "9b648fe94e9a38f0530ef4fc8296fc0677840f223f9926800b7f4470a222fdab"
            },
            "downloads": -1,
            "filename": "html_for_docx-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dfd5e25c99ae31d8ad22ecf87d30fffa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 18244,
            "upload_time": "2024-08-07T00:52:36",
            "upload_time_iso_8601": "2024-08-07T00:52:36.165701Z",
            "url": "https://files.pythonhosted.org/packages/cd/b5/d6887d485dd8d480652eb884c903ebd82a5805422ed7fe3d146259b0c45c/html_for_docx-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "afad62c2b12aa48426b0ec8bc6718cd246bf581118953d0beaa9bf340e157884",
                "md5": "b8cbdeb913bf266e959c2c95e4790ccc",
                "sha256": "91997baf1d0b3fe5e6213b7966a736d2ac388a4537c58c19a4ba0e420ca050de"
            },
            "downloads": -1,
            "filename": "html_for_docx-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b8cbdeb913bf266e959c2c95e4790ccc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 18615,
            "upload_time": "2024-08-07T00:52:38",
            "upload_time_iso_8601": "2024-08-07T00:52:38.013678Z",
            "url": "https://files.pythonhosted.org/packages/af/ad/62c2b12aa48426b0ec8bc6718cd246bf581118953d0beaa9bf340e157884/html_for_docx-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-07 00:52:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dfop02",
    "github_project": "html4docx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "python-docx",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        }
    ],
    "lcname": "html-for-docx"
}
        
Elapsed time: 0.30580s