[![Unit tests](https://github.com/hnesk/browse-ocrd/workflows/Unit%20tests/badge.svg?branch=master)](https://github.com/hnesk/browse-ocrd/actions/workflows/unittest.yml)
[![Docker build](https://github.com/hnesk/browse-ocrd/actions/workflows/dockerhub.yml/badge.svg)](https://github.com/hnesk/browse-ocrd/actions/workflows/dockerhub.yml)
[![PyPI version](https://badge.fury.io/py/browse-ocrd.svg)](https://badge.fury.io/py/browse-ocrd)
# OCR-D Browser
An extensible viewer for [OCR-D](https://ocr-d.de/) [mets.xml](https://ocr-d.de/en/spec/mets) files
* [Screenshot](#screenshot)
* [Features](#features)
* [Installation](#installation)
* [Native](#native-tested-on-ubuntu-18042004)
* [From source](#from-source)
* [Via pip](#via-pip)
* [Docker](#docker)
* [Usage](#usage)
* [Native GUI](#native-gui)
* [Docker service](#docker-service)
* [Configuration](#configuration)
* [Configuration file locations](#configuration-file-locations)
* [Configuration file syntax](#configuration-file-syntax)
## Screenshot
![OCRD Browser with Page and Xml view](docs/screenshot.png)
## Features
- Browse fileGrps and pages, arranging views next to each other for comparison
- PageView: Show original or derived page images with [PAGE-XML](https://ocr-d.de/en/spec/page) annotations overlay, similar to [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)
- ImageView: Show original or derived images (`AlternativeImage` on any level of the structural hierarchy)
- ImageView: Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
- XmlView: Show raw [PAGE-XML](https://ocr-d.de/en/spec/page) with syntax highlighting, open with [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)
- TextView: Show concatenated [PAGE-XML](https://ocr-d.de/en/spec/page) text annotation
- DiffView: Show a simple diff comparison between text annotations from different fileGrps
- HtmlView: Show rendered HTML comparison from [dinglehopper](https://github.com/qurator-spk/dinglehopper) evaluations
## Installation
OCR-D Browser requires Python 3.7 or higher.
### Native (tested on Ubuntu 18.04/20.04)
The native installation requires [GTK 3](https://www.gtk.org/).
In any case you need a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) with a current `pip` version (>=20), preferably your existing OCR-D venv:
<details>
<summary>Create a current pip venv:</summary>
```bash
sudo apt install python3-pip python3-venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
```
</details>
#### From source
```bash
git clone https://github.com/hnesk/browse-ocrd.git
cd browse-ocrd
sudo make deps-ubuntu
make install
```
#### Via pip
```bash
sudo apt install libcairo2-dev libgirepository1.0-dev
pip install browse-ocrd
```
### Docker
If you have installed [Docker](https://docs.docker.com/get-docker/), you can build OCR-D Browser as a **web service**:
docker build -t ocrd_browser .
Or use a prebuilt image from Dockerhub:
docker pull hnesk/ocrd_browser
## Usage
### Native GUI
Start the app with the filesystem path to the METS file of your [OCR-D workspace](https://ocr-d.de/en/spec/glossary#workspace):
```
browse-ocrd ./path/to/mets.xml
```
You can still open another METS file from the UI though.
### Docker service
When running the webservice, you need to pass a directory `DATADIR` which (recursively) contains all the workspaces you want to serve.
The top entrypoint `http://localhost/` will show an index page with a link `http://localhost/browse/...` for each workspace path.
Each link will run `browse-ocrd` at that workspace in the background, and then redirect your browser to the internal [Broadway server](https://docs.gtk.org/gtk3/broadway.html), which renders the app in the web browser.
To start up, just do:
docker run -it --rm -v DATADIR:/data -p 8085:8085 -p 8080:8080 ocrd_browser
## Configuration
### Configuration file locations
At startup the following directories a searched for a config file named `ocrd-browser.conf`
```python
# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs() # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir() # '/home/jk/.config/ocrd-browser.conf'
os.getcwd() # './ocrd-browser.conf'
```
### Configuration file syntax
The `ocrd-browser.conf` file is an ini-file with the following sections and keys:
```ini
[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser
# Comma separated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL
# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"
[Tool PageViewer]
# shell commandline to execute with placeholders
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}
```
> Note: You can get PRImA's PageViewer at [Github](https://github.com/PRImA-Research-Lab/prima-page-viewer/releases).
The `commandline` string will be used as a python format string with the keyword arguments:
* `workspace` : The current `ocrd.Workspace`, all properties get shell escaped (by `shlex.quote`) automatically.
* `file` : The current `ocrd_models.OcrdFile`, all properties get shell escaped (by `shlex.quote`) automatically, also there is an additional property `path` with the properties `absolute` and `relative`, so `{file.path.absolute}` will be replaced by the shell quoted absolute path of the file.
### Configuration by environment variables
It is possible to set or override values of the configuration through environment variables. The environment variables follow this structure : `BROCRD__{SECTION}__{KEY}`, where `SECTION` and `KEY` are in upper snake case and divided by a double underscore (`__`). If the section title contains spaces, the single words are also divided by `__`.
Some examples:
```shell
BROCRD__FILE_GROUPS__PREFERRED_IMAGES='THUMB'
BROCRD__TOOL__PAGEVIEWER__COMMANDLINE='ls {file.path.absolute}'
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hnesk/browse-ocrd",
"name": "browse-ocrd",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "OCR,OCR-D,mets,PAGE Xml",
"author": "Johannes K\u00fcnsebeck",
"author_email": "kuensebeck@googlemail.com",
"download_url": "https://files.pythonhosted.org/packages/c8/a8/75f071c13eea8056883322c3b0f114347acc0c2f51e16de370ceba80978b/browse-ocrd-0.5.5.tar.gz",
"platform": null,
"description": "[![Unit tests](https://github.com/hnesk/browse-ocrd/workflows/Unit%20tests/badge.svg?branch=master)](https://github.com/hnesk/browse-ocrd/actions/workflows/unittest.yml)\n[![Docker build](https://github.com/hnesk/browse-ocrd/actions/workflows/dockerhub.yml/badge.svg)](https://github.com/hnesk/browse-ocrd/actions/workflows/dockerhub.yml)\n[![PyPI version](https://badge.fury.io/py/browse-ocrd.svg)](https://badge.fury.io/py/browse-ocrd)\n# OCR-D Browser\n\nAn extensible viewer for [OCR-D](https://ocr-d.de/) [mets.xml](https://ocr-d.de/en/spec/mets) files\n\n * [Screenshot](#screenshot)\n * [Features](#features)\n * [Installation](#installation)\n * [Native](#native-tested-on-ubuntu-18042004)\n * [From source](#from-source)\n * [Via pip](#via-pip)\n * [Docker](#docker)\n * [Usage](#usage)\n * [Native GUI](#native-gui)\n * [Docker service](#docker-service)\n * [Configuration](#configuration)\n * [Configuration file locations](#configuration-file-locations)\n * [Configuration file syntax](#configuration-file-syntax)\n \n## Screenshot\n\n![OCRD Browser with Page and Xml view](docs/screenshot.png)\n\n\n## Features\n\n- Browse fileGrps and pages, arranging views next to each other for comparison\n- PageView: Show original or derived page images with [PAGE-XML](https://ocr-d.de/en/spec/page) annotations overlay, similar to [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)\n- ImageView: Show original or derived images (`AlternativeImage` on any level of the structural hierarchy)\n- ImageView: Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely\n- XmlView: Show raw [PAGE-XML](https://ocr-d.de/en/spec/page) with syntax highlighting, open with [PageViewer](https://github.com/PRImA-Research-Lab/prima-page-viewer)\n- TextView: Show concatenated [PAGE-XML](https://ocr-d.de/en/spec/page) text annotation\n- DiffView: Show a simple diff comparison between text annotations from different fileGrps \n- HtmlView: Show rendered HTML comparison from [dinglehopper](https://github.com/qurator-spk/dinglehopper) evaluations\n\n## Installation\n\nOCR-D Browser requires Python 3.7 or higher.\n\n### Native (tested on Ubuntu 18.04/20.04) \n\nThe native installation requires [GTK 3](https://www.gtk.org/).\n\nIn any case you need a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) with a current `pip` version (>=20), preferably your existing OCR-D venv:\n\n<details>\n <summary>Create a current pip venv:</summary>\n\n```bash\nsudo apt install python3-pip python3-venv \npython3 -m venv venv\nsource venv/bin/activate\npip install --upgrade pip setuptools wheel\n```\n</details>\n\n\n#### From source\n```bash\ngit clone https://github.com/hnesk/browse-ocrd.git \ncd browse-ocrd\nsudo make deps-ubuntu\nmake install\n```\n\n#### Via pip\n\n```bash\nsudo apt install libcairo2-dev libgirepository1.0-dev\npip install browse-ocrd\n```\n\n### Docker\n\nIf you have installed [Docker](https://docs.docker.com/get-docker/), you can build OCR-D Browser as a **web service**:\n\n docker build -t ocrd_browser .\n\nOr use a prebuilt image from Dockerhub:\n\n docker pull hnesk/ocrd_browser\n\n\n## Usage\n\n### Native GUI\nStart the app with the filesystem path to the METS file of your [OCR-D workspace](https://ocr-d.de/en/spec/glossary#workspace):\n```\nbrowse-ocrd ./path/to/mets.xml\n```\n\nYou can still open another METS file from the UI though.\n\n### Docker service\n\nWhen running the webservice, you need to pass a directory `DATADIR` which (recursively) contains all the workspaces you want to serve.\nThe top entrypoint `http://localhost/` will show an index page with a link `http://localhost/browse/...` for each workspace path.\nEach link will run `browse-ocrd` at that workspace in the background, and then redirect your browser to the internal [Broadway server](https://docs.gtk.org/gtk3/broadway.html), which renders the app in the web browser.\n\nTo start up, just do:\n\n docker run -it --rm -v DATADIR:/data -p 8085:8085 -p 8080:8080 ocrd_browser\n\n\n## Configuration\n\n### Configuration file locations\n\nAt startup the following directories a searched for a config file named `ocrd-browser.conf` \n\n```python\n# directories and their default values under Ubuntu 20.04\nGLib.get_system_config_dirs() # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'\nGLib.get_user_config_dir() # '/home/jk/.config/ocrd-browser.conf' \nos.getcwd() # './ocrd-browser.conf'\n```\n\n### Configuration file syntax\n\nThe `ocrd-browser.conf` file is an ini-file with the following sections and keys:\n```ini\n[FileGroups]\n# Preferred fileGrp names for thumbnail display in the Page Browser \n# Comma separated list of regular expressions\npreferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL\n\n# Each Tool has a section header [Tool XYZ]\n# At the moment the only defined tool is \"PageViewer\" \n[Tool PageViewer]\n# shell commandline to execute with placeholders \ncommandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}\n```\n\n> Note: You can get PRImA's PageViewer at [Github](https://github.com/PRImA-Research-Lab/prima-page-viewer/releases).\n\n\nThe `commandline` string will be used as a python format string with the keyword arguments:\n\n* `workspace` : The current `ocrd.Workspace`, all properties get shell escaped (by `shlex.quote`) automatically.\n* `file` : The current `ocrd_models.OcrdFile`, all properties get shell escaped (by `shlex.quote`) automatically, also there is an additional property `path` with the properties `absolute` and `relative`, so `{file.path.absolute}` will be replaced by the shell quoted absolute path of the file. \n\n### Configuration by environment variables\n\nIt is possible to set or override values of the configuration through environment variables. The environment variables follow this structure : `BROCRD__{SECTION}__{KEY}`, where `SECTION` and `KEY` are in upper snake case and divided by a double underscore (`__`). If the section title contains spaces, the single words are also divided by `__`. \n\nSome examples:\n```shell\nBROCRD__FILE_GROUPS__PREFERRED_IMAGES='THUMB' \nBROCRD__TOOL__PAGEVIEWER__COMMANDLINE='ls {file.path.absolute}' \n\n```\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "An extensible viewer for OCR-D workspaces",
"version": "0.5.5",
"split_keywords": [
"ocr",
"ocr-d",
"mets",
"page xml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "45dca7621e6f46837fc7749d6d731e59d59edd3a91de02a29da883da19a39075",
"md5": "353f108c01b213c9d6b8ea87dce2ab01",
"sha256": "ec1853070bea6b4abbae17ad2322c788eaac9d7e5233ee9b2eb206bfb0be6133"
},
"downloads": -1,
"filename": "browse_ocrd-0.5.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "353f108c01b213c9d6b8ea87dce2ab01",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 98377,
"upload_time": "2023-04-27T21:03:11",
"upload_time_iso_8601": "2023-04-27T21:03:11.640035Z",
"url": "https://files.pythonhosted.org/packages/45/dc/a7621e6f46837fc7749d6d731e59d59edd3a91de02a29da883da19a39075/browse_ocrd-0.5.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c8a875f071c13eea8056883322c3b0f114347acc0c2f51e16de370ceba80978b",
"md5": "ec4f86220f669c157900a20454a0f54d",
"sha256": "20ed6de3e330a1d3e9e2b66e86bb64159f877f0bed0e9c017263f0963e966949"
},
"downloads": -1,
"filename": "browse-ocrd-0.5.5.tar.gz",
"has_sig": false,
"md5_digest": "ec4f86220f669c157900a20454a0f54d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 88133,
"upload_time": "2023-04-27T21:03:14",
"upload_time_iso_8601": "2023-04-27T21:03:14.078821Z",
"url": "https://files.pythonhosted.org/packages/c8/a8/75f071c13eea8056883322c3b0f114347acc0c2f51e16de370ceba80978b/browse-ocrd-0.5.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-27 21:03:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "hnesk",
"github_project": "browse-ocrd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "ocrd",
"specs": [
[
">=",
"2.43.0"
]
]
},
{
"name": "Pillow",
"specs": []
},
{
"name": "numpy",
"specs": [
[
">=",
"1.20"
]
]
},
{
"name": "opencv-python-headless",
"specs": []
},
{
"name": "PyGObject",
"specs": [
[
">=",
"3.28"
]
]
},
{
"name": "python-magic",
"specs": []
},
{
"name": "wheel",
"specs": []
},
{
"name": "setuptools",
"specs": []
},
{
"name": "lxml",
"specs": []
},
{
"name": "Shapely",
"specs": []
},
{
"name": "Deprecated",
"specs": []
},
{
"name": "importlib_metadata",
"specs": [
[
">=",
"3.6"
]
]
},
{
"name": "importlib_resources",
"specs": []
},
{
"name": "pydantic",
"specs": [
[
"~=",
"1.10"
]
]
}
],
"lcname": "browse-ocrd"
}