ocrd-detectron2

Name	ocrd-detectron2 JSON
Version	0.1.8 JSON
	download
home_page	https://github.com/bertsky/ocrd_detectron2
Summary	OCR-D wrapper for detectron2 based segmentation models
upload_time	2023-06-29 12:58:55
maintainer
docs_url	None
author	Robert Sachunsky, Julian Balling
requires_python
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![PyPI version](https://badge.fury.io/py/ocrd-detectron2.svg)](https://badge.fury.io/py/ocrd-detectron2)
[![Python test](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml/badge.svg)](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml)
[![Docker Automated build via Github Container Registry](https://github.com/bertsky/ocrd_detectron2/actions/workflows/docker-image.yml/badge.svg)](https://github.com/bertsky/ocrd_detectron2/actions/workflows/docker-image.yml)
[![Docker Automated build via Dockerhub](https://img.shields.io/docker/automated/bertsky/ocrd_detectron2.svg)](https://hub.docker.com/r/bertsky/ocrd_detectron2/tags/)

# ocrd_detectron2

    OCR-D wrapper for detectron2 based segmentation models

  * [Introduction](#introduction)
  * [Installation](#installation)
  * [Usage](#usage)
     * [OCR-D processor interface ocrd-detectron2-segment](#ocr-d-processor-interface-ocrd-detectron2-segment)
  * [Models](#models)
     * [TableBank](#tablebank)
     * [PubLayNet](#publaynet)
     * [PubLayNet](#publaynet-1)
     * [LayoutParser](#layoutparser)
     * [DocBank](#docbank)
  * [Testing](#testing)
     * [Test results](#test-results)

## Introduction

This offers [OCR-D](https://ocr-d.de) compliant [workspace processors](https://ocr-d.de/en/spec/cli) for document layout analysis with models trained on [Detectron2](https://github.com/facebookresearch/detectron2), which implements [Faster R-CNN](https://arxiv.org/abs/1506.01497), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Cascade R-CNN](https://arxiv.org/abs/1712.00726), [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) and [Panoptic Segmentation](https://arxiv.org/abs/1801.00868), among others.

In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of [models](#models) may be difficult, and needs configuration. Class labels (really [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).

Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.

## Installation

Create and activate a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) as usual.

To install Python dependencies:

    make deps

Which is the equivalent of:

    pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
    pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only

To install this module, then do:

    make install

Which is the equivalent of:

    pip install .

**Alternatively**, you can use the provided **Docker image** (either from [Github Container Registry](https://github.com/users/bertsky/packages/container/package/ocrd_detectron2) or from [Dockerhub](https://hub.docker.com/r/bertsky/ocrd_detectron2)):

    docker pull bertsky/ocrd_detectron2
    # or
    docker pull ghcr.io/bertsky/ocrd_detectron2


## Usage

### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-detectron2-segment`

To be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.

```
Usage: ocrd-detectron2-segment [OPTIONS]

  Detect regions with Detectron2 models

  > Use detectron2 to segment each page into regions.

  > Open and deserialize PAGE input files and their respective images.
  > Fetch a raw and a binarized image for the page frame (possibly
  > cropped and deskewed).

  > Feed the raw image into the detectron2 predictor that has been used
  > to load the given model. Then, depending on the model capabilities
  > (whether it can do panoptic segmentation or only instance
  > segmentation, whether the latter can do masks or only bounding
  > boxes), post-process the predictions:

  > - panoptic segmentation: take the provided segment label map, and
  >   apply the segment to class label map,
  > - instance segmentation: find an optimal non-overlapping set (flat
  >   map) of instances via non-maximum suppression,
  > - both: avoid overlapping pre-existing top-level regions (incremental
  >   segmentation).

  > Then extend / shrink the surviving masks to fully include / exclude
  > connected components in the foreground that are on the boundary.

  > (This describes the steps when ``postprocessing`` is `full`. A value
  > of `only-nms` will omit the morphological extension/shrinking, while
  > `only-morph` will omit the non-maximum suppression, and `none` will
  > skip all postprocessing.)

  > Finally, find the convex hull polygon for each region, and map its
  > class id to a new PAGE region type (and subtype).

  > (Does not annotate `ReadingOrder` or `TextLine`s or `@orientation`.)

  > Produce a new output file by serialising the resulting hierarchy.

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  --profile                       Enable profiling
  --profile-file                  Write cProfile stats to this file. Implies --profile
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -D, --dump-module-dir           Output the 'module' directory with resources for this processor
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "operation_level" [string - "page"]
    hierarchy level which to predict and assign regions for
    Possible values: ["page", "table"]
   "categories" [array - REQUIRED]
    maps each category (class index) of the model to a PAGE region
    type (and @type or @custom if separated by colon), e.g.
    ['TextRegion:paragraph', 'TextRegion:heading',
    'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet;
    categories with an empty string will be skipped during prediction
   "model_config" [string - REQUIRED]
    path name of model config
   "model_weights" [string - REQUIRED]
    path name of model weights
   "min_confidence" [number - 0.5]
    confidence threshold for detections
   "postprocessing" [string - "full"]
    which postprocessing steps to enable: by default, applies a custom
    non-maximum suppression (to avoid overlaps) and morphological
    operations (using connected component analysis on the binarized
    input image to shrink or expand regions)
    Possible values: ["full", "only-nms", "only-morph", "none"]
   "debug_img" [string - "none"]
    paint an AlternativeImage which blends the input image
    and all raw decoded region candidates
    Possible values: ["none", "instance_colors", "instance_colors_only", "category_colors"]
   "device" [string - "cuda"]
    select computing device for Torch (e.g. cpu or cuda:0); will fall
    back to CPU if no GPU is available
```

Example:

    # download one preconfigured model:
    ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml
    ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth
    # run it (setting model_config, model_weights and categories):
    ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
    # run it (equivalent, with presets file)
    ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1 
    # download all preconfigured models
    ocrd resmgr download ocrd-detectron2-segment "*"

For installation **via Docker**, usage is bascially the same as above – with some modifications:

    # For data persistency, decide which host-side directories you want to mount in Docker:
    DATADIR=/host-side/path/to/data
    MODELDIR=/host-side/path/to/models
    # Either you "log in" to a container first:
    docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources -it bertsky/ocrd_detectron2 bash
    # and then can use the above commands verbatim
    ...
    # Or you spin up a new container each time,
    # which means prefixing the above commands with
    docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources bertsky/ocrd_detectron2 ...


#### Debugging

If you mistrust your model, and/or this tool's additional postprocessing,
try playing with the runtime parameters:

- Set `debug_img` to some value other than `none`, e.g. `instance_colors_only`.
  This will generate an image which overlays the raw predictions with the raw image
  using Detectron2's internal visualiser. The parameter settings correspond to its
  [ColorMode](https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.ColorMode).
  The AlternativeImages will have `@comments="debug"`, and will also be referenced in the METS,
  which allows convenient browsing with [OCR-D Browser](https://github.com/hnesk/browse-ocrd).
  (For example, open the Page View and Image View side by side, and navigate to your output
  fileGrp on each.)
- Selectively disable postprocessing steps: from the default `full` via `only-nms` (first stage)
  or `only-morph` (second stage) to `none`.
- Lower `min_confidence` to get more candidates, raise to get fewer.

## Models

Some of the following models have already been registered as known [file resources](https://ocr-d.de/en/spec/cli#processor-resources), along with parameter presets to use them conveniently.

To get a list of registered models **available for download**, do:

    ocrd resmgr list-available -e ocrd-detectron2-segment

To get a list of **already installed** models and presets, do:

    ocrd resmgr list-installed -e ocrd-detectron2-segment

To **download** a registered model (i.e. a config file and the respective weights file), do:

    ocrd resmgr download ocrd-detectron2-segment NAME.yaml
    ocrd resmgr download ocrd-detectron2-segment NAME.pth

To download more models (registered or other), see:

    ocrd resmgr download --help

To **use** a model, do:

    ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...
    ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file

To add (i.e. register) a **new model**, you first have to find:
- the classes it is trained on, so you can then define a mapping to PAGE-XML region (and subregion) types,
- a download link to the model config and model weights file. 
  Archives (zip/tar) are allowed, but then you must also specify the file paths to extract.

Assuming you have done so, then proceed as follows:

    # from local file path
    ocrd resmgr download -n path/to/model/config.yml ocrd-detectron2-segment NAME.yml
    ocrd resmgr download -n path/to/model/weights.pth ocrd-detectron2-segment NAME.pth
    # from single file URL
    ocrd resmgr download -n https://path.to/model/config.yml ocrd-detectron2-segment NAME.yml
    ocrd resmgr download -n https://path.to/model/weights.pth ocrd-detectron2-segment NAME.pth
    # from zip file URL
    ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/config.yml ocrd-detectron2-segment NAME.yml
    ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/weights.pth ocrd-detectron2-segment NAME.pth
    # create corresponding preset file
    echo '{"model_weights": "NAME.pth", "model_config": "NAME.yml", "categories": [...]}' > NAME.json
    # install preset file so it can be used everywhere (not just in CWD):
    ocrd resmgr download -n NAME.json ocrd-detectron2-segment NAME.json
    # now the new model can be used just like the preregistered models
    ocrd-detectron2-segment -p NAME.json ...


What follows is an **overview** of the **preregistered** models (i.e. available via `resmgr`).

> **Note**: These are just examples, no exhaustive search was done yet!

> **Note**: The filename suffix (.pth vs .pkl) of the weight file does matter!

### [TableBank](https://github.com/doc-analysis/TableBank)

X152-FPN [config](https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/All_X152.yaml)|[weights](https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/model_final.pth)|`["TableRegion"]`

### [TableBank](https://github.com/Psarpei/Multi-Type-TD-TSR)

X152-FPN [config](https://drive.google.com/drive/folders/1COTV5f7dEAA4Txmxy3LVfcNHiPSc4Bmp?usp=sharing)|[weights](https://drive.google.com/drive/folders/1COTV5f7dEAA4Txmxy3LVfcNHiPSc4Bmp?usp=sharing)|`["TableRegion"]`

### [PubLayNet](https://github.com/hpanwar08/detectron2)

R50-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_R_50_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/44ez171b2qaocd2/AAB0huidzzOXeo99QdplZRjua)|`["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]`

R101-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_R_101_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/wgt9skz67usliei/AAD9n6qbsyMz1Y3CwpZpHXCpa)|`["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]`

X101-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/1098ym6vhad4zi6/AABe16eSdY_34KGp52W0ruwha)|`["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]`

### [PubLayNet](https://github.com/JPLeoRX/detectron2-publaynet)

R50-FPN [config](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml)|[weights](https://drive.google.com/file/d/1IbxaRd82hIrxPT4a1U61_g2vvE3zcRLO/view?usp=sharing)|`["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]`

R101-FPN [config](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml)|[weights](https://drive.google.com/file/d/17MD-FegQtFRNn4GeHqKCLaQZ6FiFrzLg/view?usp=sharing)|`["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]`

### [LayoutParser](https://github.com/Layout-Parser/layout-parser/blob/master/src/layoutparser/models/detectron2/catalog.py)

provides different model variants of various depths for multiple datasets:
- [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) (Medical Research Papers)
- [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) (Tables Computer Typesetting)
- [PRImALayout](https://www.primaresearch.org/dataset/) (Various Computer Typesetting)  
  R50-FPN [config](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1)|[weights](https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1)|`["Background","TextRegion","ImageRegion","TableRegion","MathsRegion","SeparatorRegion","LineDrawingRegion"]`
- [HJDataset](https://dell-research-harvard.github.io/HJDataset/) (Historical Japanese Magazines)
- [NewspaperNavigator](https://news-navigator.labs.loc.gov/) (Historical Newspapers)
- [Math Formula Detection](http://transcriptorium.eu/~htrcontest/MathsICDAR2021/)

See [here](https://github.com/Layout-Parser/layout-parser/blob/master/docs/notes/modelzoo.md) for an overview,
and [here](https://github.com/Layout-Parser/layout-parser/blob/main/src/layoutparser/models/detectron2/catalog.py) for the model files.
You will have to adapt the label map to conform to [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML)
region (sub)types accordingly.

### [PubLaynet finetuning](https://github.com/Jambo-sudo/Historical-document-layout-analysis)

(pre-trained on PubLayNet, fine-tuned on a custom, non-public GT corpus of 500 pages 20th century magazines)

X101-FPN [config](https://github.com/Jambo-sudo/Historical-document-layout-analysis/raw/main/historical-document-analysis/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml)|[weights](https://www.dropbox.com/s/hfhsdpvg7jesd4g/pub_model_final.pth?dl=1)|`["TextRegion:caption","ImageRegion","TextRegion:page-number","TableRegion","TextRegion:heading","TextRegion:paragraph"]`

### [DocBank](https://github.com/doc-analysis/DocBank/blob/master/MODEL_ZOO.md)

X101-FPN [archive](https://layoutlm.blob.core.windows.net/docbank/model_zoo/X101.zip)

Proposed mappings:
- `["TextRegion:header", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"]` (using only predefined `@type`)
- `["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"]` (using `@custom` as well)

## Testing

To install Python dependencies and download some models:

    make deps-test

Which is the equivalent of:

    pip install -r requirements-test.txt
    make models-test

To run the tests, then do:

    make test

You can inspect the results under `test/assets/*/data` under various new `OCR-D-SEG-*` fileGrps.
(Again, it is recommended to use [OCR-D Browser](https://github.com/hnesk/browse-ocrd).)

Finally, to remove the test data, do:

    make clean

### Test results

These tests are integrated as a [Github Action](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml). Its results can be viewed [here](https://bertsky.github.io/ocrd_detectron2/test-results).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bertsky/ocrd_detectron2",
    "name": "ocrd-detectron2",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Robert Sachunsky, Julian Balling",
    "author_email": "sachunsky@informatik.uni-leipzig.de, balling@infai.org",
    "download_url": "https://files.pythonhosted.org/packages/c5/2c/43edc4c8b772685e855ab8d19cff22a77e8ff6246bad38b5fe3c9e650a2f/ocrd_detectron2-0.1.8.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://badge.fury.io/py/ocrd-detectron2.svg)](https://badge.fury.io/py/ocrd-detectron2)\n[![Python test](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml/badge.svg)](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml)\n[![Docker Automated build via Github Container Registry](https://github.com/bertsky/ocrd_detectron2/actions/workflows/docker-image.yml/badge.svg)](https://github.com/bertsky/ocrd_detectron2/actions/workflows/docker-image.yml)\n[![Docker Automated build via Dockerhub](https://img.shields.io/docker/automated/bertsky/ocrd_detectron2.svg)](https://hub.docker.com/r/bertsky/ocrd_detectron2/tags/)\n\n# ocrd_detectron2\n\n    OCR-D wrapper for detectron2 based segmentation models\n\n  * [Introduction](#introduction)\n  * [Installation](#installation)\n  * [Usage](#usage)\n     * [OCR-D processor interface ocrd-detectron2-segment](#ocr-d-processor-interface-ocrd-detectron2-segment)\n  * [Models](#models)\n     * [TableBank](#tablebank)\n     * [PubLayNet](#publaynet)\n     * [PubLayNet](#publaynet-1)\n     * [LayoutParser](#layoutparser)\n     * [DocBank](#docbank)\n  * [Testing](#testing)\n     * [Test results](#test-results)\n\n## Introduction\n\nThis offers [OCR-D](https://ocr-d.de) compliant [workspace processors](https://ocr-d.de/en/spec/cli) for document layout analysis with models trained on [Detectron2](https://github.com/facebookresearch/detectron2), which implements [Faster R-CNN](https://arxiv.org/abs/1506.01497), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Cascade R-CNN](https://arxiv.org/abs/1712.00726), [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) and [Panoptic Segmentation](https://arxiv.org/abs/1801.00868), among others.\n\nIn trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of [models](#models) may be difficult, and needs configuration. Class labels (really [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).\n\nOnly meant for (coarse) page segmentation into regions \u2013 no text lines, no reading order, no orientation.\n\n## Installation\n\nCreate and activate a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) as usual.\n\nTo install Python dependencies:\n\n    make deps\n\nWhich is the equivalent of:\n\n    pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3\n    pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only\n\nTo install this module, then do:\n\n    make install\n\nWhich is the equivalent of:\n\n    pip install .\n\n**Alternatively**, you can use the provided **Docker image** (either from [Github Container Registry](https://github.com/users/bertsky/packages/container/package/ocrd_detectron2) or from [Dockerhub](https://hub.docker.com/r/bertsky/ocrd_detectron2)):\n\n    docker pull bertsky/ocrd_detectron2\n    # or\n    docker pull ghcr.io/bertsky/ocrd_detectron2\n\n\n## Usage\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-detectron2-segment`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-detectron2-segment [OPTIONS]\n\n  Detect regions with Detectron2 models\n\n  > Use detectron2 to segment each page into regions.\n\n  > Open and deserialize PAGE input files and their respective images.\n  > Fetch a raw and a binarized image for the page frame (possibly\n  > cropped and deskewed).\n\n  > Feed the raw image into the detectron2 predictor that has been used\n  > to load the given model. Then, depending on the model capabilities\n  > (whether it can do panoptic segmentation or only instance\n  > segmentation, whether the latter can do masks or only bounding\n  > boxes), post-process the predictions:\n\n  > - panoptic segmentation: take the provided segment label map, and\n  >   apply the segment to class label map,\n  > - instance segmentation: find an optimal non-overlapping set (flat\n  >   map) of instances via non-maximum suppression,\n  > - both: avoid overlapping pre-existing top-level regions (incremental\n  >   segmentation).\n\n  > Then extend / shrink the surviving masks to fully include / exclude\n  > connected components in the foreground that are on the boundary.\n\n  > (This describes the steps when ``postprocessing`` is `full`. A value\n  > of `only-nms` will omit the morphological extension/shrinking, while\n  > `only-morph` will omit the non-maximum suppression, and `none` will\n  > skip all postprocessing.)\n\n  > Finally, find the convex hull polygon for each region, and map its\n  > class id to a new PAGE region type (and subtype).\n\n  > (Does not annotate `ReadingOrder` or `TextLine`s or `@orientation`.)\n\n  > Produce a new output file by serialising the resulting hierarchy.\n\nOptions:\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process\n  --overwrite                     Remove existing output pages/images\n                                  (with --page-id, remove only those)\n  --profile                       Enable profiling\n  --profile-file                  Write cProfile stats to this file. Implies --profile\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -m, --mets URL-PATH             URL or file path of METS to process\n  -w, --working-dir PATH          Working directory of local workspace\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON and exit\n  -D, --dump-module-dir           Output the 'module' directory with resources for this processor\n  -h, --help                      This help message\n  -V, --version                   Show version\n\nParameters:\n   \"operation_level\" [string - \"page\"]\n    hierarchy level which to predict and assign regions for\n    Possible values: [\"page\", \"table\"]\n   \"categories\" [array - REQUIRED]\n    maps each category (class index) of the model to a PAGE region\n    type (and @type or @custom if separated by colon), e.g.\n    ['TextRegion:paragraph', 'TextRegion:heading',\n    'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet;\n    categories with an empty string will be skipped during prediction\n   \"model_config\" [string - REQUIRED]\n    path name of model config\n   \"model_weights\" [string - REQUIRED]\n    path name of model weights\n   \"min_confidence\" [number - 0.5]\n    confidence threshold for detections\n   \"postprocessing\" [string - \"full\"]\n    which postprocessing steps to enable: by default, applies a custom\n    non-maximum suppression (to avoid overlaps) and morphological\n    operations (using connected component analysis on the binarized\n    input image to shrink or expand regions)\n    Possible values: [\"full\", \"only-nms\", \"only-morph\", \"none\"]\n   \"debug_img\" [string - \"none\"]\n    paint an AlternativeImage which blends the input image\n    and all raw decoded region candidates\n    Possible values: [\"none\", \"instance_colors\", \"instance_colors_only\", \"category_colors\"]\n   \"device\" [string - \"cuda\"]\n    select computing device for Torch (e.g. cpu or cuda:0); will fall\n    back to CPU if no GPU is available\n```\n\nExample:\n\n    # download one preconfigured model:\n    ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml\n    ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth\n    # run it (setting model_config, model_weights and categories):\n    ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '[\"TableRegion\"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1\n    # run it (equivalent, with presets file)\n    ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1 \n    # download all preconfigured models\n    ocrd resmgr download ocrd-detectron2-segment \"*\"\n\nFor installation **via Docker**, usage is bascially the same as above \u2013 with some modifications:\n\n    # For data persistency, decide which host-side directories you want to mount in Docker:\n    DATADIR=/host-side/path/to/data\n    MODELDIR=/host-side/path/to/models\n    # Either you \"log in\" to a container first:\n    docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources -it bertsky/ocrd_detectron2 bash\n    # and then can use the above commands verbatim\n    ...\n    # Or you spin up a new container each time,\n    # which means prefixing the above commands with\n    docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources bertsky/ocrd_detectron2 ...\n\n\n#### Debugging\n\nIf you mistrust your model, and/or this tool's additional postprocessing,\ntry playing with the runtime parameters:\n\n- Set `debug_img` to some value other than `none`, e.g. `instance_colors_only`.\n  This will generate an image which overlays the raw predictions with the raw image\n  using Detectron2's internal visualiser. The parameter settings correspond to its\n  [ColorMode](https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.ColorMode).\n  The AlternativeImages will have `@comments=\"debug\"`, and will also be referenced in the METS,\n  which allows convenient browsing with [OCR-D Browser](https://github.com/hnesk/browse-ocrd).\n  (For example, open the Page View and Image View side by side, and navigate to your output\n  fileGrp on each.)\n- Selectively disable postprocessing steps: from the default `full` via `only-nms` (first stage)\n  or `only-morph` (second stage) to `none`.\n- Lower `min_confidence` to get more candidates, raise to get fewer.\n\n## Models\n\nSome of the following models have already been registered as known [file resources](https://ocr-d.de/en/spec/cli#processor-resources), along with parameter presets to use them conveniently.\n\nTo get a list of registered models **available for download**, do:\n\n    ocrd resmgr list-available -e ocrd-detectron2-segment\n\nTo get a list of **already installed** models and presets, do:\n\n    ocrd resmgr list-installed -e ocrd-detectron2-segment\n\nTo **download** a registered model (i.e. a config file and the respective weights file), do:\n\n    ocrd resmgr download ocrd-detectron2-segment NAME.yaml\n    ocrd resmgr download ocrd-detectron2-segment NAME.pth\n\nTo download more models (registered or other), see:\n\n    ocrd resmgr download --help\n\nTo **use** a model, do:\n\n    ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...\n    ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file\n\nTo add (i.e. register) a **new model**, you first have to find:\n- the classes it is trained on, so you can then define a mapping to PAGE-XML region (and subregion) types,\n- a download link to the model config and model weights file. \n  Archives (zip/tar) are allowed, but then you must also specify the file paths to extract.\n\nAssuming you have done so, then proceed as follows:\n\n    # from local file path\n    ocrd resmgr download -n path/to/model/config.yml ocrd-detectron2-segment NAME.yml\n    ocrd resmgr download -n path/to/model/weights.pth ocrd-detectron2-segment NAME.pth\n    # from single file URL\n    ocrd resmgr download -n https://path.to/model/config.yml ocrd-detectron2-segment NAME.yml\n    ocrd resmgr download -n https://path.to/model/weights.pth ocrd-detectron2-segment NAME.pth\n    # from zip file URL\n    ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/config.yml ocrd-detectron2-segment NAME.yml\n    ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/weights.pth ocrd-detectron2-segment NAME.pth\n    # create corresponding preset file\n    echo '{\"model_weights\": \"NAME.pth\", \"model_config\": \"NAME.yml\", \"categories\": [...]}' > NAME.json\n    # install preset file so it can be used everywhere (not just in CWD):\n    ocrd resmgr download -n NAME.json ocrd-detectron2-segment NAME.json\n    # now the new model can be used just like the preregistered models\n    ocrd-detectron2-segment -p NAME.json ...\n\n\nWhat follows is an **overview** of the **preregistered** models (i.e. available via `resmgr`).\n\n> **Note**: These are just examples, no exhaustive search was done yet!\n\n> **Note**: The filename suffix (.pth vs .pkl) of the weight file does matter!\n\n### [TableBank](https://github.com/doc-analysis/TableBank)\n\nX152-FPN [config](https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/All_X152.yaml)|[weights](https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/model_final.pth)|`[\"TableRegion\"]`\n\n### [TableBank](https://github.com/Psarpei/Multi-Type-TD-TSR)\n\nX152-FPN [config](https://drive.google.com/drive/folders/1COTV5f7dEAA4Txmxy3LVfcNHiPSc4Bmp?usp=sharing)|[weights](https://drive.google.com/drive/folders/1COTV5f7dEAA4Txmxy3LVfcNHiPSc4Bmp?usp=sharing)|`[\"TableRegion\"]`\n\n### [PubLayNet](https://github.com/hpanwar08/detectron2)\n\nR50-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_R_50_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/44ez171b2qaocd2/AAB0huidzzOXeo99QdplZRjua)|`[\"TextRegion:paragraph\", \"TextRegion:heading\", \"TextRegion:floating\", \"TableRegion\", \"ImageRegion\"]`\n\nR101-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_R_101_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/wgt9skz67usliei/AAD9n6qbsyMz1Y3CwpZpHXCpa)|`[\"TextRegion:paragraph\", \"TextRegion:heading\", \"TextRegion:floating\", \"TableRegion\", \"ImageRegion\"]`\n\nX101-FPN [config](https://github.com/hpanwar08/detectron2/raw/master/configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml)|[weights](https://www.dropbox.com/sh/1098ym6vhad4zi6/AABe16eSdY_34KGp52W0ruwha)|`[\"TextRegion:paragraph\", \"TextRegion:heading\", \"TextRegion:floating\", \"TableRegion\", \"ImageRegion\"]`\n\n### [PubLayNet](https://github.com/JPLeoRX/detectron2-publaynet)\n\nR50-FPN [config](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml)|[weights](https://drive.google.com/file/d/1IbxaRd82hIrxPT4a1U61_g2vvE3zcRLO/view?usp=sharing)|`[\"TextRegion:paragraph\", \"TextRegion:heading\", \"TextRegion:floating\", \"TableRegion\", \"ImageRegion\"]`\n\nR101-FPN [config](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml)|[weights](https://drive.google.com/file/d/17MD-FegQtFRNn4GeHqKCLaQZ6FiFrzLg/view?usp=sharing)|`[\"TextRegion:paragraph\", \"TextRegion:heading\", \"TextRegion:floating\", \"TableRegion\", \"ImageRegion\"]`\n\n### [LayoutParser](https://github.com/Layout-Parser/layout-parser/blob/master/src/layoutparser/models/detectron2/catalog.py)\n\nprovides different model variants of various depths for multiple datasets:\n- [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) (Medical Research Papers)\n- [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) (Tables Computer Typesetting)\n- [PRImALayout](https://www.primaresearch.org/dataset/) (Various Computer Typesetting)  \n  R50-FPN [config](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1)|[weights](https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1)|`[\"Background\",\"TextRegion\",\"ImageRegion\",\"TableRegion\",\"MathsRegion\",\"SeparatorRegion\",\"LineDrawingRegion\"]`\n- [HJDataset](https://dell-research-harvard.github.io/HJDataset/) (Historical Japanese Magazines)\n- [NewspaperNavigator](https://news-navigator.labs.loc.gov/) (Historical Newspapers)\n- [Math Formula Detection](http://transcriptorium.eu/~htrcontest/MathsICDAR2021/)\n\nSee [here](https://github.com/Layout-Parser/layout-parser/blob/master/docs/notes/modelzoo.md) for an overview,\nand [here](https://github.com/Layout-Parser/layout-parser/blob/main/src/layoutparser/models/detectron2/catalog.py) for the model files.\nYou will have to adapt the label map to conform to [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML)\nregion (sub)types accordingly.\n\n### [PubLaynet finetuning](https://github.com/Jambo-sudo/Historical-document-layout-analysis)\n\n(pre-trained on PubLayNet, fine-tuned on a custom, non-public GT corpus of 500 pages 20th century magazines)\n\nX101-FPN [config](https://github.com/Jambo-sudo/Historical-document-layout-analysis/raw/main/historical-document-analysis/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml)|[weights](https://www.dropbox.com/s/hfhsdpvg7jesd4g/pub_model_final.pth?dl=1)|`[\"TextRegion:caption\",\"ImageRegion\",\"TextRegion:page-number\",\"TableRegion\",\"TextRegion:heading\",\"TextRegion:paragraph\"]`\n\n### [DocBank](https://github.com/doc-analysis/DocBank/blob/master/MODEL_ZOO.md)\n\nX101-FPN [archive](https://layoutlm.blob.core.windows.net/docbank/model_zoo/X101.zip)\n\nProposed mappings:\n- `[\"TextRegion:header\", \"TextRegion:credit\", \"TextRegion:caption\", \"TextRegion:other\", \"MathsRegion\", \"GraphicRegion\", \"TextRegion:footer\", \"TextRegion:floating\", \"TextRegion:paragraph\", \"TextRegion:endnote\", \"TextRegion:heading\", \"TableRegion\", \"TextRegion:heading\"]` (using only predefined `@type`)\n- `[\"TextRegion:abstract\", \"TextRegion:author\", \"TextRegion:caption\", \"TextRegion:date\", \"MathsRegion\", \"GraphicRegion\", \"TextRegion:footer\", \"TextRegion:list\", \"TextRegion:paragraph\", \"TextRegion:reference\", \"TextRegion:heading\", \"TableRegion\", \"TextRegion:title\"]` (using `@custom` as well)\n\n## Testing\n\nTo install Python dependencies and download some models:\n\n    make deps-test\n\nWhich is the equivalent of:\n\n    pip install -r requirements-test.txt\n    make models-test\n\nTo run the tests, then do:\n\n    make test\n\nYou can inspect the results under `test/assets/*/data` under various new `OCR-D-SEG-*` fileGrps.\n(Again, it is recommended to use [OCR-D Browser](https://github.com/hnesk/browse-ocrd).)\n\nFinally, to remove the test data, do:\n\n    make clean\n\n### Test results\n\nThese tests are integrated as a [Github Action](https://github.com/bertsky/ocrd_detectron2/actions/workflows/python-app.yml). Its results can be viewed [here](https://bertsky.github.io/ocrd_detectron2/test-results).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "OCR-D wrapper for detectron2 based segmentation models",
    "version": "0.1.8",
    "project_urls": {
        "Homepage": "https://github.com/bertsky/ocrd_detectron2"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "924878595bb1732b467213ec54f291e46f35f4f580069c7843a2af80e9dcdb43",
                "md5": "e74b013b5533c570bafd27ba1454ce10",
                "sha256": "3219eb8fb3d575bede5eb9d53fe65ce4955c3dbbda69d7e8a8f0f65b178d9be7"
            },
            "downloads": -1,
            "filename": "ocrd_detectron2-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e74b013b5533c570bafd27ba1454ce10",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25732,
            "upload_time": "2023-06-29T12:58:53",
            "upload_time_iso_8601": "2023-06-29T12:58:53.613730Z",
            "url": "https://files.pythonhosted.org/packages/92/48/78595bb1732b467213ec54f291e46f35f4f580069c7843a2af80e9dcdb43/ocrd_detectron2-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c52c43edc4c8b772685e855ab8d19cff22a77e8ff6246bad38b5fe3c9e650a2f",
                "md5": "fad435cfab3e979b2d221db539f9913e",
                "sha256": "8a191a7f031f80bc26cec12fc22e098530bf5b8f30036eda2383a369f7d94a6d"
            },
            "downloads": -1,
            "filename": "ocrd_detectron2-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "fad435cfab3e979b2d221db539f9913e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27525,
            "upload_time": "2023-06-29T12:58:55",
            "upload_time_iso_8601": "2023-06-29T12:58:55.795353Z",
            "url": "https://files.pythonhosted.org/packages/c5/2c/43edc4c8b772685e855ab8d19cff22a77e8ff6246bad38b5fe3c9e650a2f/ocrd_detectron2-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-29 12:58:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bertsky",
    "github_project": "ocrd_detectron2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ocrd-detectron2"
}

Robert Sachunsky, Julian Balling