doclayout-yolo

Name	doclayout-yolo JSON
Version	0.0.2 JSON
	download
home_page	None
Summary	DocLayout-YOLO: an effecient and robust document layout analysis method.
upload_time	2024-10-16 03:19:59
maintainer	Zhiyuan Zhao, Bin Wang
docs_url	None
author	Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He
requires_python	>=3.8
license	AGPL-3.0
keywords	document layout analysis yolo
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception


Official PyTorch implementation of **DocLayout-YOLO**.

Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He

<details>
  <summary>
  <font size="+1">Abstract</font>
  </summary>
We introduce DocLayout-YOLO, which not only enhances accuracy but also preserves the speed advantage through optimization from pre-training and model perspectives in a document-tailored manner. In terms of robust document pretraining, we innovatively regard document synthetic as a 2D bin packing problem and introduce Mesh-candidate Bestfit, which enables the generation of large-scale, diverse document datasets. The model, pre-trained on the resulting DocSynth300K dataset, significantly enhances fine-tuning performance across a variety of document types. In terms of model enhancement for document understanding, we propose a Global-to-local Controllable Receptive Module which emulates the human visual process from global to local perspectives and features a controllable module for feature extraction and integration. Experimental results on extensive downstream datasets show that the proposed DocLayout-YOLO excels at both speed and accuracy.
</details>

<p align="center">
  <img src="assets/comp.png" width=52%>
  <img src="assets/radar.png" width=44%> <br>
</p>

## Quick Start

### 1. Environment Setup

To set up your environment, follow these steps:

```bash
conda create -n doclayout_yolo python=3.10
conda activate doclayout_yolo
pip install -e .
```

**Note:** If you only need the package for inference, you can simply install it via pip:

```bash
pip install doclayout-yolo
```

### 2. Prediction

You can perform predictions using either a script or the SDK:

- **Script**

  Run the following command to make a prediction using the script:

  ```bash
  python demo.py --model path/to/model --image-path path/to/image
  ```

- **SDK**

  Here is an example of how to use the SDK for prediction:

  ```python
  import cv2
  from doclayout_yolo import YOLOv10

  # Load the pre-trained model
  model = YOLOv10("path/to/provided/model")

  # Perform prediction
  det_res = model.predict(
      "path/to/image",   # Image to predict
      imgsz=1024,        # Prediction image size
      conf=0.2,          # Confidence threshold
      device="cuda:0"    # Device to use (e.g., 'cuda:0' or 'cpu')
  )

  # Annotate and save the result
  annotated_frame = det_res[0].plot(pil=True, line_width=5, font_size=20)
  cv2.imwrite("result.jpg", annotated_frame)
  ```


We provide model fine-tuned on **DocStructBench** for prediction, **which is capable of handing various document types**. Model can be downloaded from [here](https://huggingface.co/juliozhao/DocLayout-YOLO-DocStructBench/tree/main) and example images can be found under ```assets/example```.

<p align="center">
  <img src="assets/showcase.png" width=100%> <br>
</p>


You also can use ```predict_single.py``` for prediction with custom inference settings. For batch process, please refer to [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit/tree/main).

## Training and Evaluation on Public DLA Datasets

### Data Preparation

1. specify data root path

Find your ultralytics config file (for Linux user in ```$HOME/.config/Ultralytics/settings.yaml)``` and change ```datasets_dir``` to project root path.

2. Download prepared yolo-format D4LA and doclaynet data from below and put to ```./layout_data```:

| Dataset | Download |
|:--:|:--:|
| D4LA | [link](https://huggingface.co/datasets/juliozhao/doclayout-yolo-D4LA) |
| DocLayNet | [link](https://huggingface.co/datasets/juliozhao/doclayout-yolo-DocLayNet) |

the file structure is as follows:

```bash
./layout_data
├── D4LA
│   ├── images
│   ├── labels
│   ├── test.txt
│   └── train.txt
└── doclaynet
    ├── images
    ├── labels
    ├── val.txt
    └── train.txt
```

### Training and Evaluation

Training is conducted on 8 GPUs with a global batch size of 64 (8 images per device), detailed settings and checkpoints are as follows:

| Dataset | Model | DocSynth300K Pretrained? | imgsz | Learning rate | Finetune | Evaluation | AP50 | mAP | Checkpoint |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| D4LA | DocLayout-YOLO | &cross; | 1600 | 0.04 | [command](assets/script.sh#L5) | [command](assets/script.sh#L11) | 81.7 | 69.8 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-D4LA-from_scratch) |
| D4LA | DocLayout-YOLO | &check; | 1600 | 0.04 | [command](assets/script.sh#L8) | [command](assets/script.sh#L11) | 82.4 | 70.3 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-D4LA-Docsynth300K_pretrained) |
| DocLayNet | DocLayout-YOLO | &cross; | 1120 | 0.02 | [command](assets/script.sh#L14) | [command](assets/script.sh#L20) | 93.0 | 77.7 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-DocLayNet-from_scratch) |
| DocLayNet | DocLayout-YOLO | &check; | 1120 | 0.02 | [command](assets/script.sh#L17) | [command](assets/script.sh#L20) | 93.4 | 79.7 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-DocLayNet-Docsynth300K_pretrained) |

The DocSynth300K pretrained model can be downloaded from [here](https://huggingface.co/juliozhao/DocLayout-YOLO-DocSynth300K-pretrain). Change ```checkpoint.pt``` to the path of model to be evaluated during evaluation.

## Acknowledgement

The code base is built with [ultralytics](https://github.com/ultralytics/ultralytics) and [YOLO-v10](https://github.com/lyuwenyu/RT-DETR).

Thanks for these great work!

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "doclayout-yolo",
    "maintainer": "Zhiyuan Zhao, Bin Wang",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Document Layout Analysis, YOLO",
    "author": "Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/19/e8/319217d955dfdca67846d24865a8114b23b98301832adfd540a2bb8f9150/doclayout_yolo-0.0.2.tar.gz",
    "platform": null,
    "description": "# DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception\n\n\nOfficial PyTorch implementation of **DocLayout-YOLO**.\n\nZhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He\n\n<details>\n  <summary>\n  <font size=\"+1\">Abstract</font>\n  </summary>\nWe introduce DocLayout-YOLO, which not only enhances accuracy but also preserves the speed advantage through optimization from pre-training and model perspectives in a document-tailored manner. In terms of robust document pretraining, we innovatively regard document synthetic as a 2D bin packing problem and introduce Mesh-candidate Bestfit, which enables the generation of large-scale, diverse document datasets. The model, pre-trained on the resulting DocSynth300K dataset, significantly enhances fine-tuning performance across a variety of document types. In terms of model enhancement for document understanding, we propose a Global-to-local Controllable Receptive Module which emulates the human visual process from global to local perspectives and features a controllable module for feature extraction and integration. Experimental results on extensive downstream datasets show that the proposed DocLayout-YOLO excels at both speed and accuracy.\n</details>\n\n<p align=\"center\">\n  <img src=\"assets/comp.png\" width=52%>\n  <img src=\"assets/radar.png\" width=44%> <br>\n</p>\n\n## Quick Start\n\n### 1. Environment Setup\n\nTo set up your environment, follow these steps:\n\n```bash\nconda create -n doclayout_yolo python=3.10\nconda activate doclayout_yolo\npip install -e .\n```\n\n**Note:** If you only need the package for inference, you can simply install it via pip:\n\n```bash\npip install doclayout-yolo\n```\n\n### 2. Prediction\n\nYou can perform predictions using either a script or the SDK:\n\n- **Script**\n\n  Run the following command to make a prediction using the script:\n\n  ```bash\n  python demo.py --model path/to/model --image-path path/to/image\n  ```\n\n- **SDK**\n\n  Here is an example of how to use the SDK for prediction:\n\n  ```python\n  import cv2\n  from doclayout_yolo import YOLOv10\n\n  # Load the pre-trained model\n  model = YOLOv10(\"path/to/provided/model\")\n\n  # Perform prediction\n  det_res = model.predict(\n      \"path/to/image\",   # Image to predict\n      imgsz=1024,        # Prediction image size\n      conf=0.2,          # Confidence threshold\n      device=\"cuda:0\"    # Device to use (e.g., 'cuda:0' or 'cpu')\n  )\n\n  # Annotate and save the result\n  annotated_frame = det_res[0].plot(pil=True, line_width=5, font_size=20)\n  cv2.imwrite(\"result.jpg\", annotated_frame)\n  ```\n\n\nWe provide model fine-tuned on **DocStructBench** for prediction, **which is capable of handing various document types**. Model can be downloaded from [here](https://huggingface.co/juliozhao/DocLayout-YOLO-DocStructBench/tree/main) and example images can be found under ```assets/example```.\n\n<p align=\"center\">\n  <img src=\"assets/showcase.png\" width=100%> <br>\n</p>\n\n\nYou also can use ```predict_single.py``` for prediction with custom inference settings. For batch process, please refer to [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit/tree/main).\n\n## Training and Evaluation on Public DLA Datasets\n\n### Data Preparation\n\n1. specify data root path\n\nFind your ultralytics config file (for Linux user in ```$HOME/.config/Ultralytics/settings.yaml)``` and change ```datasets_dir``` to project root path.\n\n2. Download prepared yolo-format D4LA and doclaynet data from below and put to ```./layout_data```:\n\n| Dataset | Download |\n|:--:|:--:|\n| D4LA | [link](https://huggingface.co/datasets/juliozhao/doclayout-yolo-D4LA) |\n| DocLayNet | [link](https://huggingface.co/datasets/juliozhao/doclayout-yolo-DocLayNet) |\n\nthe file structure is as follows:\n\n```bash\n./layout_data\n\u251c\u2500\u2500 D4LA\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 images\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 labels\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 test.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 train.txt\n\u2514\u2500\u2500 doclaynet\n    \u251c\u2500\u2500 images\n \u00a0\u00a0 \u251c\u2500\u2500 labels\n \u00a0\u00a0 \u251c\u2500\u2500 val.txt\n \u00a0\u00a0 \u2514\u2500\u2500 train.txt\n```\n\n### Training and Evaluation\n\nTraining is conducted on 8 GPUs with a global batch size of 64 (8 images per device), detailed settings and checkpoints are as follows:\n\n| Dataset | Model | DocSynth300K Pretrained? | imgsz | Learning rate | Finetune | Evaluation | AP50 | mAP | Checkpoint |\n|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|\n| D4LA | DocLayout-YOLO | &cross; | 1600 | 0.04 | [command](assets/script.sh#L5) | [command](assets/script.sh#L11) | 81.7 | 69.8 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-D4LA-from_scratch) |\n| D4LA | DocLayout-YOLO | &check; | 1600 | 0.04 | [command](assets/script.sh#L8) | [command](assets/script.sh#L11) | 82.4 | 70.3 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-D4LA-Docsynth300K_pretrained) |\n| DocLayNet | DocLayout-YOLO | &cross; | 1120 | 0.02 | [command](assets/script.sh#L14) | [command](assets/script.sh#L20) | 93.0 | 77.7 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-DocLayNet-from_scratch) |\n| DocLayNet | DocLayout-YOLO | &check; | 1120 | 0.02 | [command](assets/script.sh#L17) | [command](assets/script.sh#L20) | 93.4 | 79.7 | [checkpoint](https://huggingface.co/juliozhao/DocLayout-YOLO-DocLayNet-Docsynth300K_pretrained) |\n\nThe DocSynth300K pretrained model can be downloaded from [here](https://huggingface.co/juliozhao/DocLayout-YOLO-DocSynth300K-pretrain). Change ```checkpoint.pt``` to the path of model to be evaluated during evaluation.\n\n## Acknowledgement\n\nThe code base is built with [ultralytics](https://github.com/ultralytics/ultralytics) and [YOLO-v10](https://github.com/lyuwenyu/RT-DETR).\n\nThanks for these great work!\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "DocLayout-YOLO: an effecient and robust document layout analysis method.",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "document layout analysis",
        " yolo"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32f7b6255e19d49a216af0d98d125eeec66e91821a20f2fe3d02456abb248309",
                "md5": "dd1be1e9b33c33d279b6720815db730e",
                "sha256": "9155d74be92c3a2441ac3dcd7263760637045480b8a4b71bde807976f9e47671"
            },
            "downloads": -1,
            "filename": "doclayout_yolo-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd1be1e9b33c33d279b6720815db730e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 708156,
            "upload_time": "2024-10-16T03:19:57",
            "upload_time_iso_8601": "2024-10-16T03:19:57.324231Z",
            "url": "https://files.pythonhosted.org/packages/32/f7/b6255e19d49a216af0d98d125eeec66e91821a20f2fe3d02456abb248309/doclayout_yolo-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "19e8319217d955dfdca67846d24865a8114b23b98301832adfd540a2bb8f9150",
                "md5": "a8d0313f9e665f5fe333df10a95d4055",
                "sha256": "c7ab629a8fd45eff6fb6c361cce65362b8326203fae891e17a95d731ce5c94c1"
            },
            "downloads": -1,
            "filename": "doclayout_yolo-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a8d0313f9e665f5fe333df10a95d4055",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 598193,
            "upload_time": "2024-10-16T03:19:59",
            "upload_time_iso_8601": "2024-10-16T03:19:59.214693Z",
            "url": "https://files.pythonhosted.org/packages/19/e8/319217d955dfdca67846d24865a8114b23b98301832adfd540a2bb8f9150/doclayout_yolo-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-16 03:19:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "doclayout-yolo"
}

Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He