from src.pypxml import XMLSchema
# PyPXML
A python library for parsing, converting and modifying PageXML files.
## Setup
```shell
pip install pypxml
```
### Install from source
1. Clone repository: `git clone https://github.com/jahtz/pypxml`
2. Install package: `cd pypxml && pip install .`
3. Test with `pypxml --version`
## CLI
```
pypxml [OPTIONS] COMMAND [ARGS]...
```
## API
PyXML provides a feature rich Python API for working with PageXML files.
### Example: Edit existing PageXML
```python
from pypxml import PageXML, PageType
pxml = PageXML.from_xml('path_to_pagexml.xml')
text_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')
text_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')
for region in pxml.regions:
print(region.type)
pxml.to_xml('path_to_output.xml')
```
## ZPD
Developed at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of Würzburg](https://www.uni-wuerzburg.de/en/).
Raw data
{
"_id": null,
"home_page": null,
"name": "pypxml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "PageXML, XML, OCR, optical character recognition",
"author": null,
"author_email": "Janik Haitz <jahtz.dev@proton.me>",
"download_url": "https://files.pythonhosted.org/packages/15/c9/77728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8/pypxml-2.1.1.tar.gz",
"platform": null,
"description": "from src.pypxml import XMLSchema\n\n# PyPXML\nA python library for parsing, converting and modifying PageXML files.\n\n## Setup\n```shell\npip install pypxml\n```\n\n### Install from source\n1. Clone repository: `git clone https://github.com/jahtz/pypxml`\n2. Install package: `cd pypxml && pip install .`\n3. Test with `pypxml --version`\n\n## CLI\n```\npypxml [OPTIONS] COMMAND [ARGS]...\n```\n\n## API\nPyXML provides a feature rich Python API for working with PageXML files.\n\n### Example: Edit existing PageXML\n```python\nfrom pypxml import PageXML, PageType\n\npxml = PageXML.from_xml('path_to_pagexml.xml')\ntext_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')\ntext_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')\n\nfor region in pxml.regions:\n print(region.type)\n\npxml.to_xml('path_to_output.xml')\n```\n\n## ZPD\nDeveloped at Centre for [Philology and Digitality](https://www.uni-wuerzburg.de/en/zpd/) (ZPD), [University of W\u00fcrzburg](https://www.uni-wuerzburg.de/en/).\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A python library for parsing, converting and modifying PageXML files.",
"version": "2.1.1",
"project_urls": {
"repository": "https://github.com/jahtz/pypxml"
},
"split_keywords": [
"pagexml",
" xml",
" ocr",
" optical character recognition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b63da262b52883549371400535a6c6f901f6ca6bd16c15871606d83cedd4a2a0",
"md5": "dc877e4249ee09ac8d5556dfeb09a9f8",
"sha256": "f783f43cd44af46e3e607560e9b50093e3a7ab401133d9317d23777f82f3d48a"
},
"downloads": -1,
"filename": "pypxml-2.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dc877e4249ee09ac8d5556dfeb09a9f8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 14647,
"upload_time": "2024-12-09T14:09:49",
"upload_time_iso_8601": "2024-12-09T14:09:49.463592Z",
"url": "https://files.pythonhosted.org/packages/b6/3d/a262b52883549371400535a6c6f901f6ca6bd16c15871606d83cedd4a2a0/pypxml-2.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "15c977728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8",
"md5": "0306f763aa26b83cd0f144a9ad8bb2a7",
"sha256": "91511996d7a7df96de087d7f108d81600179b983190585e52c972c5c88c1a822"
},
"downloads": -1,
"filename": "pypxml-2.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0306f763aa26b83cd0f144a9ad8bb2a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 11750,
"upload_time": "2024-12-09T14:09:51",
"upload_time_iso_8601": "2024-12-09T14:09:51.833659Z",
"url": "https://files.pythonhosted.org/packages/15/c9/77728c73b34383b462ea947e9cb352a2502c7726360c156bcc275ebbdba8/pypxml-2.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-09 14:09:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jahtz",
"github_project": "pypxml",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pypxml"
}