

## Parse all contents of a docx file with `python-docx`
### Installation
```bash
python3 -m pip install docx-parser
```
### Features:
- `paragraph`: text paragraph, with style_id
- `multipart`: paragraph with image or hyperlink
- `table`: table data with merged_cells
### Examples
- CMD
```bash
docx_parser --help
# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl
# parse image as base64 string
docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
```
- Python
```python
from docx_parser import DocumentParser
infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
print(_type, item)
```
---
### ToDo
- parse text style: color, bgcolor, font, bold, italic ...
- parse paragraph format
Raw data
{
"_id": null,
"home_page": "https://github.com/suqingdong/docx_parser",
"name": "docx-parser",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "suqingdong",
"author_email": "suqingdong1114@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/98/04/0838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6/docx_parser-1.0.2.tar.gz",
"platform": null,
"description": "\n\n\n## Parse all contents of a docx file with `python-docx`\n\n### Installation\n```bash\npython3 -m pip install docx-parser\n```\n\n### Features:\n- `paragraph`: text paragraph, with style_id\n- `multipart`: paragraph with image or hyperlink\n- `table`: table data with merged_cells\n\n### Examples\n- CMD\n```bash\ndocx_parser --help\n\n# parse image as file\ndocx_parser tests/demo.docx -D tests/media -o tests/out.file.jl\n\n# parse image as base64 string\ndocx_parser tests/demo.docx -A base64 -o tests/out.base64.jl\n```\n- Python\n```python\nfrom docx_parser import DocumentParser\n\ninfile = 'tests/demo.docx'\ndoc = DocumentParser(infile)\nfor _type, item in doc.parse():\n print(_type, item)\n```\n---\n\n### ToDo\n- parse text style: color, bgcolor, font, bold, italic ...\n- parse paragraph format\n\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "parse all contents of a docx file with python-docx",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/suqingdong/docx_parser"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "069cc954a03fd83928d1e7176e758f47620705100fd832af950b883b738bbe9f",
"md5": "59d218692c62d45252541af338c11fce",
"sha256": "21025d28663c7f1f8d3ece755f02b872c3d7814fe59018bef5fd74a6d1cddab4"
},
"downloads": -1,
"filename": "docx_parser-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "59d218692c62d45252541af338c11fce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5853,
"upload_time": "2023-11-22T03:01:57",
"upload_time_iso_8601": "2023-11-22T03:01:57.131414Z",
"url": "https://files.pythonhosted.org/packages/06/9c/c954a03fd83928d1e7176e758f47620705100fd832af950b883b738bbe9f/docx_parser-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "98040838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6",
"md5": "c0b8bfac60b51bf32a57ec42af68e64a",
"sha256": "91a9f63c7e2a34cb5ead8e05979efd685454e16a89b23f1b58167f39662df87a"
},
"downloads": -1,
"filename": "docx_parser-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "c0b8bfac60b51bf32a57ec42af68e64a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5262,
"upload_time": "2023-11-22T03:02:09",
"upload_time_iso_8601": "2023-11-22T03:02:09.815521Z",
"url": "https://files.pythonhosted.org/packages/98/04/0838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6/docx_parser-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-22 03:02:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "suqingdong",
"github_project": "docx_parser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "docx-parser"
}