pypxml


Namepypxml JSON
Version 2.0 PyPI version JSON
download
home_pageNone
SummaryA python library for parsing, converting and modifying PageXML files.
upload_time2024-10-18 09:40:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT License
keywords pagexml xml ocr optical character recognition
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyPXML
A python library for parsing, converting and modifying PageXML files.

## Setup
```shell
pip install pypxml
```

### Install from source
1. Clone repository: `git clone https://github.com/jahtz/pypxml`
2. Install package: `cd pypxml && pip install .`
3. Test with `pypxml --version`

## CLI
```
pypxml [OPTIONS] COMMAND [ARGS]...
```

## API
PyXML provides a feature rich Python API for working with PageXML files.

### Example: Edit existing PageXML
```python
from pypxml import PageXML, PageType

pxml = PageXML.from_xml('path_to_pagexml.xml')
text_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')
text_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')

for region in pxml.regions:
    print(region.type)

pxml.to_xml('path_to_output.xml')
```

## ZPD
Developed at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of Würzburg](https://www.uni-wuerzburg.de/en/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pypxml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "PageXML, XML, OCR, optical character recognition",
    "author": null,
    "author_email": "Janik Haitz <jahtz.dev@proton.me>",
    "download_url": "https://files.pythonhosted.org/packages/7f/c6/d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e/pypxml-2.0.tar.gz",
    "platform": null,
    "description": "# PyPXML\nA python library for parsing, converting and modifying PageXML files.\n\n## Setup\n```shell\npip install pypxml\n```\n\n### Install from source\n1. Clone repository: `git clone https://github.com/jahtz/pypxml`\n2. Install package: `cd pypxml && pip install .`\n3. Test with `pypxml --version`\n\n## CLI\n```\npypxml [OPTIONS] COMMAND [ARGS]...\n```\n\n## API\nPyXML provides a feature rich Python API for working with PageXML files.\n\n### Example: Edit existing PageXML\n```python\nfrom pypxml import PageXML, PageType\n\npxml = PageXML.from_xml('path_to_pagexml.xml')\ntext_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')\ntext_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')\n\nfor region in pxml.regions:\n    print(region.type)\n\npxml.to_xml('path_to_output.xml')\n```\n\n## ZPD\nDeveloped at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of W\u00fcrzburg](https://www.uni-wuerzburg.de/en/).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A python library for parsing, converting and modifying PageXML files.",
    "version": "2.0",
    "project_urls": {
        "repository": "https://github.com/jahtz/pypxml"
    },
    "split_keywords": [
        "pagexml",
        " xml",
        " ocr",
        " optical character recognition"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dfc22a25d243e639d35342336f25277071016ad53a4ee7f475ebf34e3cb645e3",
                "md5": "f81988f932cce9ef259cc27b64eec30a",
                "sha256": "955db9039060416bce11656355030ef19235ae3c0c48c4de9e824375f81009a4"
            },
            "downloads": -1,
            "filename": "pypxml-2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f81988f932cce9ef259cc27b64eec30a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 11426,
            "upload_time": "2024-10-18T09:40:31",
            "upload_time_iso_8601": "2024-10-18T09:40:31.812939Z",
            "url": "https://files.pythonhosted.org/packages/df/c2/2a25d243e639d35342336f25277071016ad53a4ee7f475ebf34e3cb645e3/pypxml-2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7fc6d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e",
                "md5": "c10ef8b4398b0fee87497f7c08f6403a",
                "sha256": "66700050d89d3c265bc53874928f4aecb9a3e8a4f745b813b9efc75c063081c5"
            },
            "downloads": -1,
            "filename": "pypxml-2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c10ef8b4398b0fee87497f7c08f6403a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 9290,
            "upload_time": "2024-10-18T09:40:33",
            "upload_time_iso_8601": "2024-10-18T09:40:33.034906Z",
            "url": "https://files.pythonhosted.org/packages/7f/c6/d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e/pypxml-2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-18 09:40:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jahtz",
    "github_project": "pypxml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pypxml"
}
        
Elapsed time: 1.19412s