sci-annot-eval


Namesci-annot-eval JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/Dzeri96/sci-annot-eval
SummaryThe evaluation component of the sci-annot framework
upload_time2023-08-06 20:33:54
maintainer
docs_urlNone
authorDzeri96
requires_python>=3.9, <4
license
keywords sci-annot object detection evaluation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ![logo](./README_assets/logo-tiny.webp)Sci-Annot Evaluation Component
[![PyPI version](https://badge.fury.io/py/sci-annot-eval.svg)](https://badge.fury.io/py/sci-annot-eval)
![Build & Test Pipeline](https://github.com/dzeri96/sci-annot-eval/actions/workflows/build-test-publish.yaml/badge.svg)

This package was developed as part of my master's thesis and used in the evaluation stage.

Its main purpose is to produce per-page confusion matrices with multiple classes for predictions in the field of Page Object Detection, with inter-object dependencies also supported.
To be more precise, it was used to compare predictions in the task of figure, table and caption extraction, but the project can somewhat easily be extended to other object types.

## Features
This tool currently supports the following commands:
- `rasterize` - Rasterize all pdfs in input folder and additionally produce a summary parquet file called render_summary.parquet in the output folder.
- `split-pdffigures2` - Take original pdffigures2 output and split it into validator-friendly per-page files.
- `benchmark` - Evaluate predictions against a ground truth and produce TP, FP, and FN metrics for each page.
- `deepfigures-predict` - Use deepfigures to detect elements from each pdf in the input folder.
- `transpile` - Take a folder of predictions in one format and output them in another.

Currently, the following prediction formats are supported:
- [Sci-Annot](https://github.com/Dzeri96/sci-annot) - The corresponding annotation front-end.
- [PDFFigures 2.0](https://github.com/allenai/pdffigures2)
- [DeepFigures](https://github.com/allenai/deepfigures-open)

_Consider contributing a parser/exporter for your system of choice!_

### How the Validation Works
The comparison of two sets of bounding boxes is modelled as an optimal assignment problem,
with the cost function being the distance between the centres of bounding boxes.
The matching algorithm runs inside each class (Figues, Tables, Captions) individually,
and uses the Intersection over Union (IoU) to decide if two bounding boxes match.
This means that if two bounding boxes look the same, but have different classes,
no True Positives will be counted towards either of those classes.
This is in contrast to some other validation schemes which award partial points in such cases.

The reference validation runs for all referenced classes at the same time (Figures and Tables in our case),
and does not take the bounding boxes' shape or class into account,
only if its reference matches the closest bounding box in the corresponding prediction set.
For more information on how this works, refer to the thesis which spawned this project.

## Installation & Usage
This tool is packaged under the name [sci-annot-eval](https://pypi.org/project/sci-annot-eval/).

You can install it like `pip install sci-annot-eval`, or `conda install sci-annot-eval`.

Once installed, call the package from your cli `sci-annot-eval COMMAND`, or use it as a library in your python project.

## Development Setup
If you wish to work on this project locally, you'll need:
- python3.9+
- pipenv

To set up the dependencies, just run `pipenv install` in the project root.
From that point on, you can do `pipenv shell`, which will launch your custom python environment with all of the dependencies installed.

When developing, you can call `python3 cli.py` in the project root to execute the local version of sci-annot-eval, instead of the installed one. 

## TODO
- Fix logging
- Add more tests

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Dzeri96/sci-annot-eval",
    "name": "sci-annot-eval",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9, <4",
    "maintainer_email": "",
    "keywords": "sci-annot,object,detection,evaluation",
    "author": "Dzeri96",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "# ![logo](./README_assets/logo-tiny.webp)Sci-Annot Evaluation Component\n[![PyPI version](https://badge.fury.io/py/sci-annot-eval.svg)](https://badge.fury.io/py/sci-annot-eval)\n![Build & Test Pipeline](https://github.com/dzeri96/sci-annot-eval/actions/workflows/build-test-publish.yaml/badge.svg)\n\nThis package was developed as part of my master's thesis and used in the evaluation stage.\n\nIts main purpose is to produce per-page confusion matrices with multiple classes for predictions in the field of Page Object Detection, with inter-object dependencies also supported.\nTo be more precise, it was used to compare predictions in the task of figure, table and caption extraction, but the project can somewhat easily be extended to other object types.\n\n## Features\nThis tool currently supports the following commands:\n- `rasterize` - Rasterize all pdfs in input folder and additionally produce a summary parquet file called render_summary.parquet in the output folder.\n- `split-pdffigures2` - Take original pdffigures2 output and split it into validator-friendly per-page files.\n- `benchmark` - Evaluate predictions against a ground truth and produce TP, FP, and FN metrics for each page.\n- `deepfigures-predict` - Use deepfigures to detect elements from each pdf in the input folder.\n- `transpile` - Take a folder of predictions in one format and output them in another.\n\nCurrently, the following prediction formats are supported:\n- [Sci-Annot](https://github.com/Dzeri96/sci-annot) - The corresponding annotation front-end.\n- [PDFFigures 2.0](https://github.com/allenai/pdffigures2)\n- [DeepFigures](https://github.com/allenai/deepfigures-open)\n\n_Consider contributing a parser/exporter for your system of choice!_\n\n### How the Validation Works\nThe comparison of two sets of bounding boxes is modelled as an optimal assignment problem,\nwith the cost function being the distance between the centres of bounding boxes.\nThe matching algorithm runs inside each class (Figues, Tables, Captions) individually,\nand uses the Intersection over Union (IoU) to decide if two bounding boxes match.\nThis means that if two bounding boxes look the same, but have different classes,\nno True Positives will be counted towards either of those classes.\nThis is in contrast to some other validation schemes which award partial points in such cases.\n\nThe reference validation runs for all referenced classes at the same time (Figures and Tables in our case),\nand does not take the bounding boxes' shape or class into account,\nonly if its reference matches the closest bounding box in the corresponding prediction set.\nFor more information on how this works, refer to the thesis which spawned this project.\n\n## Installation & Usage\nThis tool is packaged under the name [sci-annot-eval](https://pypi.org/project/sci-annot-eval/).\n\nYou can install it like `pip install sci-annot-eval`, or `conda install sci-annot-eval`.\n\nOnce installed, call the package from your cli `sci-annot-eval COMMAND`, or use it as a library in your python project.\n\n## Development Setup\nIf you wish to work on this project locally, you'll need:\n- python3.9+\n- pipenv\n\nTo set up the dependencies, just run `pipenv install` in the project root.\nFrom that point on, you can do `pipenv shell`, which will launch your custom python environment with all of the dependencies installed.\n\nWhen developing, you can call `python3 cli.py` in the project root to execute the local version of sci-annot-eval, instead of the installed one. \n\n## TODO\n- Fix logging\n- Add more tests\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "The evaluation component of the sci-annot framework",
    "version": "0.0.9",
    "project_urls": {
        "Homepage": "https://github.com/Dzeri96/sci-annot-eval"
    },
    "split_keywords": [
        "sci-annot",
        "object",
        "detection",
        "evaluation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47eb87a35af784349b37a820e3b2d1fd699b0281a5e7b6df7cc187078b891af8",
                "md5": "c2ba8e495a29ffff2f764225bb7b242a",
                "sha256": "b72289ad270145944a63d9cdc3e64ca4617dc1f47dbc4e6d7ee307ae4186cb8b"
            },
            "downloads": -1,
            "filename": "sci_annot_eval-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c2ba8e495a29ffff2f764225bb7b242a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9, <4",
            "size": 26812,
            "upload_time": "2023-08-06T20:33:54",
            "upload_time_iso_8601": "2023-08-06T20:33:54.124209Z",
            "url": "https://files.pythonhosted.org/packages/47/eb/87a35af784349b37a820e3b2d1fd699b0281a5e7b6df7cc187078b891af8/sci_annot_eval-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-06 20:33:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Dzeri96",
    "github_project": "sci-annot-eval",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sci-annot-eval"
}
        
Elapsed time: 0.22929s