docling-ibm-models


Namedocling-ibm-models JSON
Version 3.2.0 PyPI version JSON
download
home_pageNone
SummaryThis package contains the AI models used by the Docling PDF conversion package
upload_time2025-01-21 15:05:16
maintainerNone
docs_urlNone
authorNikos Livathinos
requires_python<4.0,>=3.9
licenseMIT
keywords docling convert document pdf layout model segmentation table structure table former
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://img.shields.io/pypi/v/docling-ibm-models)](https://pypi.org/project/docling-ibm-models/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling-ibm-models)](https://pypi.org/project/docling-ibm-models/)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Models on Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/ds4sd/docling-models/)
[![License MIT](https://img.shields.io/github/license/ds4sd/deepsearch-toolkit)](https://opensource.org/licenses/MIT)

# Docling IBM models

AI modules to support the Docling PDF document conversion project.

- TableFormer is an AI module that recognizes the structure of a table and the bounding boxes of the table content.
- Layout model is an AI model that provides among other things ability to detect tables on the page. This package contains inference code for Layout model.


## Installation Instructions

### MacOS / Linux

To install `poetry` locally, use either `pip` or `homebrew`.

To install `poetry` on a docker container, do the following:
```
ENV POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_CREATE=false

# Install poetry
RUN curl -sSL 'https://install.python-poetry.org' > install-poetry.py \
    && python install-poetry.py \
    && poetry --version \
    && rm install-poetry.py
```

To install and run the package, simply set up a poetry environment

```
poetry env use $(which python3.10)
poetry shell
```

and install all the dependencies,

```
poetry install # this will only install the deps from the poetry.lock

poetry install --no-dev # this will skip installing dev dependencies
```

To update or add new dependencies from `pyproject.toml`, rebuild `poetry.lock`
```
poetry update
```

#### MacOS Intel

When in development mode on MacOS with Intel chips, one can use compatible dependencies with

```console
poetry update --with mac_intel
```


## Pipeline Overview
![Architecture](docs/tablemodel_overview_color.png)

## Datasets
Below we list datasets used with their description, source, and ***"TableFormer Format"***. The TableFormer Format is our processed version of the version of the original format to work with the dataloader out of the box, and to augment the dataset when necassary to add missing groundtruth (bounding boxes for empty cells).


| Name        | Description      | URL |
| ------------- |:-------------:|----|
| PubTabNet | PubTabNet contains heterogeneous tables in both image and HTML format, 516k+ tables in the PubMed Central Open Access Subset  | [PubTabNet](https://developer.ibm.com/exchanges/data/all/pubtabnet/) |
| FinTabNet| A dataset for Financial Report Tables with corresponding ground truth location and structure. 112k+ tables included.| [FinTabNet](https://developer.ibm.com/exchanges/data/all/fintabnet/) |
| TableBank| TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet, contains 417K high-quality labeled tables. | [TableBank](https://github.com/doc-analysis/TableBank) |

## Models

### TableModel04:
![TableModel04](docs/tbm04.png)
**TableModel04rs (OTSL)** is our SOTA method that using transformers in order to predict table structure and bounding box.


## Configuration file

Example configuration can be found inside test `tests/test_tf_predictor.py`
These are the main sections of the configuration file:

- `dataset`: The directory for prepared data and the parameters used during the data loading.
- `model`: The type, name and hyperparameters of the model. Also the directory to save/load the
  trained checkpoint files.
- `train`: Parameters for the training of the model.
- `predict`: Parameters for the evaluation of the model.
- `dataset_wordmap`: Very important part that contains token maps.


## Model weights

You can download the model weights and config files from the links:

- [TableFormer Checkpoint](https://huggingface.co/ds4sd/docling-models/tree/main/model_artifacts/tableformer)
- [beehive_v0.0.5](https://huggingface.co/ds4sd/docling-models/tree/main/model_artifacts/layout/beehive_v0.0.5)


## Inference Tests

You can run the inference tests for the models with:

```
python -m pytest tests/
```

This will also generate prediction and matching visualizations that can be found here:
`tests\test_data\viz\`

Visualization outlines:
- `Light Pink`: border of recognized table
- `Grey`: OCR cells
- `Green`: prediction bboxes
- `Red`: OCR cells matched with prediction
- `Blue`: Post processed, match
- `Bold Blue`: column header
- `Bold Magenta`: row header
- `Bold Brown`: section row (if table have one)


## Demo

A demo application allows to apply the `LayoutPredictor` on a directory `<input_dir>` that contains
`png` images and visualize the predictions inside another directory `<viz_dir>`.

First download the model weights (see above), then run:
```
python -m demo.demo_layout_predictor -i <input_dir> -v <viz_dir>
```

e.g.
```
python -m demo.demo_layout_predictor -i tests/test_data/samples -v viz/
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "docling-ibm-models",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "docling, convert, document, pdf, layout model, segmentation, table structure, table former",
    "author": "Nikos Livathinos",
    "author_email": "nli@zurich.ibm.com",
    "download_url": "https://files.pythonhosted.org/packages/dc/f6/959f5407667e71b5bfeda24f4d4381e5ce7c05f3f020789b446607e7f3d1/docling_ibm_models-3.2.0.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://img.shields.io/pypi/v/docling-ibm-models)](https://pypi.org/project/docling-ibm-models/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/docling-ibm-models)](https://pypi.org/project/docling-ibm-models/)\n[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)\n[![Models on Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/ds4sd/docling-models/)\n[![License MIT](https://img.shields.io/github/license/ds4sd/deepsearch-toolkit)](https://opensource.org/licenses/MIT)\n\n# Docling IBM models\n\nAI modules to support the Docling PDF document conversion project.\n\n- TableFormer is an AI module that recognizes the structure of a table and the bounding boxes of the table content.\n- Layout model is an AI model that provides among other things ability to detect tables on the page. This package contains inference code for Layout model.\n\n\n## Installation Instructions\n\n### MacOS / Linux\n\nTo install `poetry` locally, use either `pip` or `homebrew`.\n\nTo install `poetry` on a docker container, do the following:\n```\nENV POETRY_NO_INTERACTION=1 \\\n    POETRY_VIRTUALENVS_CREATE=false\n\n# Install poetry\nRUN curl -sSL 'https://install.python-poetry.org' > install-poetry.py \\\n    && python install-poetry.py \\\n    && poetry --version \\\n    && rm install-poetry.py\n```\n\nTo install and run the package, simply set up a poetry environment\n\n```\npoetry env use $(which python3.10)\npoetry shell\n```\n\nand install all the dependencies,\n\n```\npoetry install # this will only install the deps from the poetry.lock\n\npoetry install --no-dev # this will skip installing dev dependencies\n```\n\nTo update or add new dependencies from `pyproject.toml`, rebuild `poetry.lock`\n```\npoetry update\n```\n\n#### MacOS Intel\n\nWhen in development mode on MacOS with Intel chips, one can use compatible dependencies with\n\n```console\npoetry update --with mac_intel\n```\n\n\n## Pipeline Overview\n![Architecture](docs/tablemodel_overview_color.png)\n\n## Datasets\nBelow we list datasets used with their description, source, and ***\"TableFormer Format\"***. The TableFormer Format is our processed version of the version of the original format to work with the dataloader out of the box, and to augment the dataset when necassary to add missing groundtruth (bounding boxes for empty cells).\n\n\n| Name        | Description      | URL |\n| ------------- |:-------------:|----|\n| PubTabNet | PubTabNet contains heterogeneous tables in both image and HTML format, 516k+ tables in the PubMed Central Open Access Subset  | [PubTabNet](https://developer.ibm.com/exchanges/data/all/pubtabnet/) |\n| FinTabNet| A dataset for Financial Report Tables with corresponding ground truth location and structure. 112k+ tables included.| [FinTabNet](https://developer.ibm.com/exchanges/data/all/fintabnet/) |\n| TableBank| TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet, contains 417K high-quality labeled tables. | [TableBank](https://github.com/doc-analysis/TableBank) |\n\n## Models\n\n### TableModel04:\n![TableModel04](docs/tbm04.png)\n**TableModel04rs (OTSL)** is our SOTA method that using transformers in order to predict table structure and bounding box.\n\n\n## Configuration file\n\nExample configuration can be found inside test `tests/test_tf_predictor.py`\nThese are the main sections of the configuration file:\n\n- `dataset`: The directory for prepared data and the parameters used during the data loading.\n- `model`: The type, name and hyperparameters of the model. Also the directory to save/load the\n  trained checkpoint files.\n- `train`: Parameters for the training of the model.\n- `predict`: Parameters for the evaluation of the model.\n- `dataset_wordmap`: Very important part that contains token maps.\n\n\n## Model weights\n\nYou can download the model weights and config files from the links:\n\n- [TableFormer Checkpoint](https://huggingface.co/ds4sd/docling-models/tree/main/model_artifacts/tableformer)\n- [beehive_v0.0.5](https://huggingface.co/ds4sd/docling-models/tree/main/model_artifacts/layout/beehive_v0.0.5)\n\n\n## Inference Tests\n\nYou can run the inference tests for the models with:\n\n```\npython -m pytest tests/\n```\n\nThis will also generate prediction and matching visualizations that can be found here:\n`tests\\test_data\\viz\\`\n\nVisualization outlines:\n- `Light Pink`: border of recognized table\n- `Grey`: OCR cells\n- `Green`: prediction bboxes\n- `Red`: OCR cells matched with prediction\n- `Blue`: Post processed, match\n- `Bold Blue`: column header\n- `Bold Magenta`: row header\n- `Bold Brown`: section row (if table have one)\n\n\n## Demo\n\nA demo application allows to apply the `LayoutPredictor` on a directory `<input_dir>` that contains\n`png` images and visualize the predictions inside another directory `<viz_dir>`.\n\nFirst download the model weights (see above), then run:\n```\npython -m demo.demo_layout_predictor -i <input_dir> -v <viz_dir>\n```\n\ne.g.\n```\npython -m demo.demo_layout_predictor -i tests/test_data/samples -v viz/\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This package contains the AI models used by the Docling PDF conversion package",
    "version": "3.2.0",
    "project_urls": null,
    "split_keywords": [
        "docling",
        " convert",
        " document",
        " pdf",
        " layout model",
        " segmentation",
        " table structure",
        " table former"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a7c67e0bf7fc9507d16c3cabfc8428bf608af40844efc4288116cc81374fb92",
                "md5": "c87523013abaf5e8074766c4330a16bd",
                "sha256": "9de784dc00f8e6db4f2acaf934bad133477b5a230f19d030e3d9ebb44e453c8e"
            },
            "downloads": -1,
            "filename": "docling_ibm_models-3.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c87523013abaf5e8074766c4330a16bd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 76498,
            "upload_time": "2025-01-21T15:05:15",
            "upload_time_iso_8601": "2025-01-21T15:05:15.667135Z",
            "url": "https://files.pythonhosted.org/packages/4a/7c/67e0bf7fc9507d16c3cabfc8428bf608af40844efc4288116cc81374fb92/docling_ibm_models-3.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcf6959f5407667e71b5bfeda24f4d4381e5ce7c05f3f020789b446607e7f3d1",
                "md5": "6229a3e1118df2baa1737be445c0cd4d",
                "sha256": "b0329256fb1464d51854f1654a4e09cbb812edfeaa104b45677952c7135c5ef8"
            },
            "downloads": -1,
            "filename": "docling_ibm_models-3.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6229a3e1118df2baa1737be445c0cd4d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 67035,
            "upload_time": "2025-01-21T15:05:16",
            "upload_time_iso_8601": "2025-01-21T15:05:16.996644Z",
            "url": "https://files.pythonhosted.org/packages/dc/f6/959f5407667e71b5bfeda24f4d4381e5ce7c05f3f020789b446607e7f3d1/docling_ibm_models-3.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-21 15:05:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "docling-ibm-models"
}
        
Elapsed time: 0.70831s