doclayout-yolo


Namedoclayout-yolo JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryUltralytics YOLOv8 for SOTA object detection, multi-object tracking, instance segmentation, pose estimation and image classification.
upload_time2024-10-11 09:10:45
maintainerGlenn Jocher, Ayush Chaurasia, Jing Qiu
docs_urlNone
authorGlenn Jocher, Ayush Chaurasia, Jing Qiu
requires_python>=3.8
licenseAGPL-3.0
keywords machine-learning deep-learning computer-vision ml dl ai yolo yolov3 yolov5 yolov8 hub ultralytics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # [DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception](https://arxiv.org/abs/2405.14458)


Official PyTorch implementation of **DocLayout-YOLO**.

[DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception](https://arxiv.org/abs/2405.14458).\
Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He\
[![arXiv](https://img.shields.io/badge/arXiv-2405.14458-b31b1b.svg)](https://arxiv.org/abs/2405.14458)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/kadirnar/Yolov10)

<details>
  <summary>
  <font size="+1">Abstract</font>
  </summary>
Document Layout Analysis (DLA) plays a critical role in real-world document understanding systems, yet it confronts a challenging speed-accuracy dilemma: multimodal methods leveraging text and visual features provide higher accuracy but suffer from glacial speed, whereas unimodal methods relying solely on visual features offer faster speed but at the cost of compromised accuracy. In addressing this dilemma, we introduce DocLayout-YOLO, which not only enhances accuracy but also preserves the speed advantage through optimization from pre-training and model perspectives in a document-tailored manner. In terms of robust document pretraining, we innovatively regard document synthetic as a 2D bin packing problem and introduce Mesh-candidate Bestfit, which enables the generation of large-scale, diverse document datasets. The model, pre-trained on the resulting DocSynth300K dataset, significantly enhances fine-tuning performance across a variety of document types. In terms of model enhancement for document understanding, we propose a Global-to-local Controllable Receptive Module which emulates the human visual process from global to local perspectives and features a controllable module for feature extraction and integration. Furthermore, to validate performance across different document types, we propose a complex and challenging benchmark named DocStructBench. Experimental results on extensive downstream datasets show that the proposed DocLayout-YOLO excels at both speed and accuracy. Code, data, and model will be made publicly available.
</details>


## Installation
`conda` virtual environment is recommended. 
```
conda create -n doclayout_yolo python=3.9
conda activate doclayout_yolo
pip install -r requirements.txt
pip install -e .
```

## Data Preparation

1. specify data root path

Find your ultralytics config file (for Linux user in $HOME/.config/Ultralytics/settings.yaml) and change ```datasets_dir``` to project root path.

2. Download prepared yolo-format D4LA and doclaynet data from below and put to ./layout_data, the file structure is as follows:

```bash
./layout_data
├── D4LA
│   ├── images
│   ├── labels
│   ├── test.txt
│   └── train.txt
└── doclaynet
    ├── images
    ├── labels
    ├── val.txt
    └── train.txt
```


## Acknowledgement

The code base is built with [ultralytics](https://github.com/ultralytics/ultralytics) and [YOLO-v10](https://github.com/lyuwenyu/RT-DETR).

Thanks for the great implementations! 

## Citation

If our code or models help your work, please cite our paper:
```BibTeX
@misc{wang2024yolov10,
      title={YOLOv10: Real-Time End-to-End Object Detection}, 
      author={Ao Wang and Hui Chen and Lihao Liu and Kai Chen and Zijia Lin and Jungong Han and Guiguang Ding},
      year={2024},
      eprint={2405.14458},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "doclayout-yolo",
    "maintainer": "Glenn Jocher, Ayush Chaurasia, Jing Qiu",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "machine-learning, deep-learning, computer-vision, ML, DL, AI, YOLO, YOLOv3, YOLOv5, YOLOv8, HUB, Ultralytics",
    "author": "Glenn Jocher, Ayush Chaurasia, Jing Qiu",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d0/f7/16c44b4eace17290b11d083a3477227be38a7696188465b193f4090bb7dd/doclayout_yolo-0.0.1.tar.gz",
    "platform": null,
    "description": "# [DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception](https://arxiv.org/abs/2405.14458)\n\n\nOfficial PyTorch implementation of **DocLayout-YOLO**.\n\n[DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception](https://arxiv.org/abs/2405.14458).\\\nZhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He\\\n[![arXiv](https://img.shields.io/badge/arXiv-2405.14458-b31b1b.svg)](https://arxiv.org/abs/2405.14458)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/kadirnar/Yolov10)\n\n<details>\n  <summary>\n  <font size=\"+1\">Abstract</font>\n  </summary>\nDocument Layout Analysis (DLA) plays a critical role in real-world document understanding systems, yet it confronts a challenging speed-accuracy dilemma: multimodal methods leveraging text and visual features provide higher accuracy but suffer from glacial speed, whereas unimodal methods relying solely on visual features offer faster speed but at the cost of compromised accuracy. In addressing this dilemma, we introduce DocLayout-YOLO, which not only enhances accuracy but also preserves the speed advantage through optimization from pre-training and model perspectives in a document-tailored manner. In terms of robust document pretraining, we innovatively regard document synthetic as a 2D bin packing problem and introduce Mesh-candidate Bestfit, which enables the generation of large-scale, diverse document datasets. The model, pre-trained on the resulting DocSynth300K dataset, significantly enhances fine-tuning performance across a variety of document types. In terms of model enhancement for document understanding, we propose a Global-to-local Controllable Receptive Module which emulates the human visual process from global to local perspectives and features a controllable module for feature extraction and integration. Furthermore, to validate performance across different document types, we propose a complex and challenging benchmark named DocStructBench. Experimental results on extensive downstream datasets show that the proposed DocLayout-YOLO excels at both speed and accuracy. Code, data, and model will be made publicly available.\n</details>\n\n\n## Installation\n`conda` virtual environment is recommended. \n```\nconda create -n doclayout_yolo python=3.9\nconda activate doclayout_yolo\npip install -r requirements.txt\npip install -e .\n```\n\n## Data Preparation\n\n1. specify data root path\n\nFind your ultralytics config file (for Linux user in $HOME/.config/Ultralytics/settings.yaml) and change ```datasets_dir``` to project root path.\n\n2. Download prepared yolo-format D4LA and doclaynet data from below and put to ./layout_data, the file structure is as follows:\n\n```bash\n./layout_data\n\u251c\u2500\u2500 D4LA\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 images\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 labels\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 test.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 train.txt\n\u2514\u2500\u2500 doclaynet\n    \u251c\u2500\u2500 images\n \u00a0\u00a0 \u251c\u2500\u2500 labels\n \u00a0\u00a0 \u251c\u2500\u2500 val.txt\n \u00a0\u00a0 \u2514\u2500\u2500 train.txt\n```\n\n\n## Acknowledgement\n\nThe code base is built with [ultralytics](https://github.com/ultralytics/ultralytics) and [YOLO-v10](https://github.com/lyuwenyu/RT-DETR).\n\nThanks for the great implementations! \n\n## Citation\n\nIf our code or models help your work, please cite our paper:\n```BibTeX\n@misc{wang2024yolov10,\n      title={YOLOv10: Real-Time End-to-End Object Detection}, \n      author={Ao Wang and Hui Chen and Lihao Liu and Kai Chen and Zijia Lin and Jungong Han and Guiguang Ding},\n      year={2024},\n      eprint={2405.14458},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "Ultralytics YOLOv8 for SOTA object detection, multi-object tracking, instance segmentation, pose estimation and image classification.",
    "version": "0.0.1",
    "project_urls": {
        "Bug Reports": "https://github.com/ultralytics/ultralytics/issues",
        "Funding": "https://ultralytics.com",
        "Source": "https://github.com/ultralytics/ultralytics/"
    },
    "split_keywords": [
        "machine-learning",
        " deep-learning",
        " computer-vision",
        " ml",
        " dl",
        " ai",
        " yolo",
        " yolov3",
        " yolov5",
        " yolov8",
        " hub",
        " ultralytics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04aeabf9b67838f2a80a24244c9cd6d94a3e7affaf7d0c70cf0993d9319e0210",
                "md5": "6eb794900eff6d2d63880e83177e76dc",
                "sha256": "ba0a1876f5f74f19adffc14faaf81b67981a8e43ce48633b0011e3bb6405f632"
            },
            "downloads": -1,
            "filename": "doclayout_yolo-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6eb794900eff6d2d63880e83177e76dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 709382,
            "upload_time": "2024-10-11T09:10:42",
            "upload_time_iso_8601": "2024-10-11T09:10:42.691934Z",
            "url": "https://files.pythonhosted.org/packages/04/ae/abf9b67838f2a80a24244c9cd6d94a3e7affaf7d0c70cf0993d9319e0210/doclayout_yolo-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d0f716c44b4eace17290b11d083a3477227be38a7696188465b193f4090bb7dd",
                "md5": "5bed60a1767927bab77b801a73fc483b",
                "sha256": "fd8c2614ba03958ee78dfb2f354bec04a43188e615fa3958936c4347a55b0cc0"
            },
            "downloads": -1,
            "filename": "doclayout_yolo-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5bed60a1767927bab77b801a73fc483b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 598523,
            "upload_time": "2024-10-11T09:10:45",
            "upload_time_iso_8601": "2024-10-11T09:10:45.541725Z",
            "url": "https://files.pythonhosted.org/packages/d0/f7/16c44b4eace17290b11d083a3477227be38a7696188465b193f4090bb7dd/doclayout_yolo-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-11 09:10:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ultralytics",
    "github_project": "ultralytics",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "doclayout-yolo"
}
        
Elapsed time: 0.90892s