docx-parser


Namedocx-parser JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/suqingdong/docx_parser
Summaryparse all contents of a docx file with python-docx
upload_time2023-11-22 03:02:09
maintainer
docs_urlNone
authorsuqingdong
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![PyPI](https://img.shields.io/pypi/v/docx_parser)
![GitHub last commit](https://img.shields.io/github/last-commit/suqingdong/docx_parser)

## Parse all contents of a docx file with `python-docx`

### Installation
```bash
python3 -m pip install docx-parser
```

### Features:
- `paragraph`: text paragraph, with style_id
- `multipart`: paragraph with image or hyperlink
- `table`: table data with merged_cells

### Examples
- CMD
```bash
docx_parser --help

# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl

# parse image as base64 string
docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
```
- Python
```python
from docx_parser import DocumentParser

infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
    print(_type, item)
```
---

### ToDo
- parse text style: color, bgcolor, font, bold, italic ...
- parse paragraph format



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/suqingdong/docx_parser",
    "name": "docx-parser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "suqingdong",
    "author_email": "suqingdong1114@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/98/04/0838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6/docx_parser-1.0.2.tar.gz",
    "platform": null,
    "description": "![PyPI](https://img.shields.io/pypi/v/docx_parser)\n![GitHub last commit](https://img.shields.io/github/last-commit/suqingdong/docx_parser)\n\n## Parse all contents of a docx file with `python-docx`\n\n### Installation\n```bash\npython3 -m pip install docx-parser\n```\n\n### Features:\n- `paragraph`: text paragraph, with style_id\n- `multipart`: paragraph with image or hyperlink\n- `table`: table data with merged_cells\n\n### Examples\n- CMD\n```bash\ndocx_parser --help\n\n# parse image as file\ndocx_parser tests/demo.docx -D tests/media -o tests/out.file.jl\n\n# parse image as base64 string\ndocx_parser tests/demo.docx -A base64 -o tests/out.base64.jl\n```\n- Python\n```python\nfrom docx_parser import DocumentParser\n\ninfile = 'tests/demo.docx'\ndoc = DocumentParser(infile)\nfor _type, item in doc.parse():\n    print(_type, item)\n```\n---\n\n### ToDo\n- parse text style: color, bgcolor, font, bold, italic ...\n- parse paragraph format\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "parse all contents of a docx file with python-docx",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/suqingdong/docx_parser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "069cc954a03fd83928d1e7176e758f47620705100fd832af950b883b738bbe9f",
                "md5": "59d218692c62d45252541af338c11fce",
                "sha256": "21025d28663c7f1f8d3ece755f02b872c3d7814fe59018bef5fd74a6d1cddab4"
            },
            "downloads": -1,
            "filename": "docx_parser-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "59d218692c62d45252541af338c11fce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5853,
            "upload_time": "2023-11-22T03:01:57",
            "upload_time_iso_8601": "2023-11-22T03:01:57.131414Z",
            "url": "https://files.pythonhosted.org/packages/06/9c/c954a03fd83928d1e7176e758f47620705100fd832af950b883b738bbe9f/docx_parser-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "98040838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6",
                "md5": "c0b8bfac60b51bf32a57ec42af68e64a",
                "sha256": "91a9f63c7e2a34cb5ead8e05979efd685454e16a89b23f1b58167f39662df87a"
            },
            "downloads": -1,
            "filename": "docx_parser-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c0b8bfac60b51bf32a57ec42af68e64a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5262,
            "upload_time": "2023-11-22T03:02:09",
            "upload_time_iso_8601": "2023-11-22T03:02:09.815521Z",
            "url": "https://files.pythonhosted.org/packages/98/04/0838d86d1eee5052e207837d8631fcae00c7d968c990c6406a0720c7c5e6/docx_parser-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-22 03:02:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "suqingdong",
    "github_project": "docx_parser",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "docx-parser"
}
        
Elapsed time: 0.14758s