# PyPXML
A python library for parsing, converting and modifying PageXML files.
## Setup
```shell
pip install pypxml
```
### Install from source
1. Clone repository: `git clone https://github.com/jahtz/pypxml`
2. Install package: `cd pypxml && pip install .`
3. Test with `pypxml --version`
## CLI
```
pypxml [OPTIONS] COMMAND [ARGS]...
```
## API
PyXML provides a feature rich Python API for working with PageXML files.
### Example: Edit existing PageXML
```python
from pypxml import PageXML, PageType
pxml = PageXML.from_xml('path_to_pagexml.xml')
text_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')
text_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')
for region in pxml.regions:
print(region.type)
pxml.to_xml('path_to_output.xml')
```
## ZPD
Developed at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of Würzburg](https://www.uni-wuerzburg.de/en/).
Raw data
{
"_id": null,
"home_page": null,
"name": "pypxml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "PageXML, XML, OCR, optical character recognition",
"author": null,
"author_email": "Janik Haitz <jahtz.dev@proton.me>",
"download_url": "https://files.pythonhosted.org/packages/7f/c6/d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e/pypxml-2.0.tar.gz",
"platform": null,
"description": "# PyPXML\nA python library for parsing, converting and modifying PageXML files.\n\n## Setup\n```shell\npip install pypxml\n```\n\n### Install from source\n1. Clone repository: `git clone https://github.com/jahtz/pypxml`\n2. Install package: `cd pypxml && pip install .`\n3. Test with `pypxml --version`\n\n## CLI\n```\npypxml [OPTIONS] COMMAND [ARGS]...\n```\n\n## API\nPyXML provides a feature rich Python API for working with PageXML files.\n\n### Example: Edit existing PageXML\n```python\nfrom pypxml import PageXML, PageType\n\npxml = PageXML.from_xml('path_to_pagexml.xml')\ntext_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')\ntext_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')\n\nfor region in pxml.regions:\n print(region.type)\n\npxml.to_xml('path_to_output.xml')\n```\n\n## ZPD\nDeveloped at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of W\u00fcrzburg](https://www.uni-wuerzburg.de/en/).\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A python library for parsing, converting and modifying PageXML files.",
"version": "2.0",
"project_urls": {
"repository": "https://github.com/jahtz/pypxml"
},
"split_keywords": [
"pagexml",
" xml",
" ocr",
" optical character recognition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dfc22a25d243e639d35342336f25277071016ad53a4ee7f475ebf34e3cb645e3",
"md5": "f81988f932cce9ef259cc27b64eec30a",
"sha256": "955db9039060416bce11656355030ef19235ae3c0c48c4de9e824375f81009a4"
},
"downloads": -1,
"filename": "pypxml-2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f81988f932cce9ef259cc27b64eec30a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 11426,
"upload_time": "2024-10-18T09:40:31",
"upload_time_iso_8601": "2024-10-18T09:40:31.812939Z",
"url": "https://files.pythonhosted.org/packages/df/c2/2a25d243e639d35342336f25277071016ad53a4ee7f475ebf34e3cb645e3/pypxml-2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7fc6d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e",
"md5": "c10ef8b4398b0fee87497f7c08f6403a",
"sha256": "66700050d89d3c265bc53874928f4aecb9a3e8a4f745b813b9efc75c063081c5"
},
"downloads": -1,
"filename": "pypxml-2.0.tar.gz",
"has_sig": false,
"md5_digest": "c10ef8b4398b0fee87497f7c08f6403a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 9290,
"upload_time": "2024-10-18T09:40:33",
"upload_time_iso_8601": "2024-10-18T09:40:33.034906Z",
"url": "https://files.pythonhosted.org/packages/7f/c6/d84d05313301b5fd4ee6bd3fe378a53543c83145070a1e1690c6a106cf7e/pypxml-2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-18 09:40:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jahtz",
"github_project": "pypxml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pypxml"
}