pypxml


Namepypxml JSON
Version 2.1.1 PyPI version JSON
download
home_pageNone
SummaryA python library for parsing, converting and modifying PageXML files.
upload_time2024-12-09 14:09:51
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT License
keywords pagexml xml ocr optical character recognition
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            from src.pypxml import XMLSchema

# PyPXML
A python library for parsing, converting and modifying PageXML files.

## Setup
```shell
pip install pypxml
```

### Install from source
1. Clone repository: `git clone https://github.com/jahtz/pypxml`
2. Install package: `cd pypxml && pip install .`
3. Test with `pypxml --version`

## CLI
```
pypxml [OPTIONS] COMMAND [ARGS]...
```

## API
PyXML provides a feature rich Python API for working with PageXML files.

### Example: Edit existing PageXML
```python
from pypxml import PageXML, PageType

pxml = PageXML.from_xml('path_to_pagexml.xml')
text_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')
text_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')

for region in pxml.regions:
    print(region.type)

pxml.to_xml('path_to_output.xml')
```

## ZPD
Developed at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of Würzburg](https://www.uni-wuerzburg.de/en/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pypxml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "PageXML, XML, OCR, optical character recognition",
    "author": null,
    "author_email": "Janik Haitz <jahtz.dev@proton.me>",
    "download_url": "https://files.pythonhosted.org/packages/15/c9/77728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8/pypxml-2.1.1.tar.gz",
    "platform": null,
    "description": "from src.pypxml import XMLSchema\n\n# PyPXML\nA python library for parsing, converting and modifying PageXML files.\n\n## Setup\n```shell\npip install pypxml\n```\n\n### Install from source\n1. Clone repository: `git clone https://github.com/jahtz/pypxml`\n2. Install package: `cd pypxml && pip install .`\n3. Test with `pypxml --version`\n\n## CLI\n```\npypxml [OPTIONS] COMMAND [ARGS]...\n```\n\n## API\nPyXML provides a feature rich Python API for working with PageXML files.\n\n### Example: Edit existing PageXML\n```python\nfrom pypxml import PageXML, PageType\n\npxml = PageXML.from_xml('path_to_pagexml.xml')\ntext_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')\ntext_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')\n\nfor region in pxml.regions:\n    print(region.type)\n\npxml.to_xml('path_to_output.xml')\n```\n\n## ZPD\nDeveloped at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of W\u00fcrzburg](https://www.uni-wuerzburg.de/en/).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A python library for parsing, converting and modifying PageXML files.",
    "version": "2.1.1",
    "project_urls": {
        "repository": "https://github.com/jahtz/pypxml"
    },
    "split_keywords": [
        "pagexml",
        " xml",
        " ocr",
        " optical character recognition"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b63da262b52883549371400535a6c6f901f6ca6bd16c15871606d83cedd4a2a0",
                "md5": "dc877e4249ee09ac8d5556dfeb09a9f8",
                "sha256": "f783f43cd44af46e3e607560e9b50093e3a7ab401133d9317d23777f82f3d48a"
            },
            "downloads": -1,
            "filename": "pypxml-2.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dc877e4249ee09ac8d5556dfeb09a9f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 14647,
            "upload_time": "2024-12-09T14:09:49",
            "upload_time_iso_8601": "2024-12-09T14:09:49.463592Z",
            "url": "https://files.pythonhosted.org/packages/b6/3d/a262b52883549371400535a6c6f901f6ca6bd16c15871606d83cedd4a2a0/pypxml-2.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15c977728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8",
                "md5": "0306f763aa26b83cd0f144a9ad8bb2a7",
                "sha256": "91511996d7a7df96de087d7f108d81600179b983190585e52c972c5c88c1a822"
            },
            "downloads": -1,
            "filename": "pypxml-2.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0306f763aa26b83cd0f144a9ad8bb2a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 11750,
            "upload_time": "2024-12-09T14:09:51",
            "upload_time_iso_8601": "2024-12-09T14:09:51.833659Z",
            "url": "https://files.pythonhosted.org/packages/15/c9/77728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8/pypxml-2.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-09 14:09:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jahtz",
    "github_project": "pypxml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pypxml"
}
        
Elapsed time: 0.44197s