kitty-doc


Namekitty-doc JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA practical tool for converting PDF to Markdown
upload_time2025-09-16 18:15:02
maintainerNone
docs_urlNone
authorNone
requires_python<3.14,>=3.10
licenseApache 2.0
keywords kitty-doc kitty_doc onnx convert pdf markdown
VCS
bugtrack_url
requirements boto3 loguru numpy pdfminer.six tqdm requests pypdfium2 pypdf reportlab pdftext json-repair fast-langdetect scikit-image openai beautifulsoup4 pydantic matplotlib ftfy shapely torch torchvision onnxruntime openvino tokenizers rapidocr
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KittyDoc – 高速文档解析产线

**基于 [Mineru](https://github.com/opendatalab/MinerU) 二次开发,移除 VLM,专注于 Pipeline 产线下的高效文档解析,在 CPU 上也能保持不错的解析速度。**

## 😺 项目介绍

KittyDoc 是一个轻量级、专注于文档解析的开源框架,支持 **OCR、版面分析、公式识别、表格识别和阅读顺序恢复** 等多种功能。  
与原框架相比,本项目使用 **[PP-StructureV3](https://www.paddleocr.ai/main/version3.x/pipeline_usage/PP-StructureV3.html) 系列模型**,并完全 **去除对 Paddle 的依赖**,所有模型均已转换为 ONNX,可直接通过 **ONNX Runtime** 或 **OpenVINO**(部分模型)进行高效推理。

---

> ✨如果该项目对您有帮助,您的star是我不断优化的动力!!!
>
> - [github点击前往](https://github.com/hzkitty/KittyDoc)
> - [gitee点击前往](https://gitee.com/hzkitty/KittyDoc)

## 👏 项目特点

- **OCR 识别**
  - 使用 [RapidOCR](https://github.com/RapidAI/RapidOCR) 支持多种推理引擎
  - CPU 下默认使用 OpenVINO,GPU 下默认使用 torch
  
- **版面识别**
  - 模型使用 `PP-DocLayout` 系列 ONNX 模型(plus-L、L、M、S)
    - **PP-DocLayout_plus-L**:效果最好,速度稍慢 
    - **PP-DocLayout-L**:速度快,效果也不错,默认使用  
    - **PP-DocLayout-S**:速度极快,可能存在部分漏检

- **公式识别**
  - 使用 `PP-FormulaNet_plus` 系列 ONNX 模型(L、M、S)
    - **PP-FormulaNet_plus-L**:速度慢  
    - **PP-FormulaNet_plus-S**:速度最快,默认使用  
  - 支持配置只识别行间公式
  - cuda环境默认不使用gpu,公式模型onnx gpu推理会报错,暂时无人解决 [PaddleOCR/issues/15125](https://github.com/PaddlePaddle/PaddleOCR/issues/15125), [PaddleX/issues/4238](https://github.com/PaddlePaddle/PaddleX/issues/4238), [Paddle2ONNX/issues/1593](https://github.com/PaddlePaddle/Paddle2ONNX/issues/1593)

- **表格识别**
  - 基于 [rapid_table_self](kitty_doc%2Fmodel%2Ftable%2Frapid_table_self) 增强,在原有基础上增强为多模型串联方案:  
    - **表格分类**(区分有线/无线表格)
    - **SLANeXt** 系列 表结构识别 + 单元格检测
    - **[有线表格识别UNET](https://github.com/RapidAI/TableStructureRec)** + SLANET_plus/UNITABLE(作为无线表格识别)

- **阅读顺序恢复**
  - 使用 PP-StructureV3 阅读顺序 `xycut++` 算法简化
  - 速度快且阅读顺序恢复效果不错

- **推理方式**
  - 所有模型通过 ONNXRuntime 推理,OCR可配置其他推理引擎
  - 除了 OCR 和 PP-DocLayout-M/S 模型,OpenVINO推理会报错,暂时难以解决。[PaddleOCR/issues/16277](https://github.com/PaddlePaddle/PaddleOCR/issues/16277)
---

## 🛠️ 安装KittyDoc

#### 使用pip安装KittyDoc
```bash
pip install kitty-doc -i https://mirrors.aliyun.com/pypi/simple
```

#### 通过源码安装KittyDoc
```bash
# 克隆仓库
git clone https://github.com/hzkitty/KittyDoc.git
cd KittyDoc

# 安装依赖
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
#### 使用gpu推理
```bash
# 在安装完kitty_doc之后,卸载cpu版的onnxruntime
pip uninstall onnxruntime
# 这里一定要确定onnxruntime-gpu与GPU对应
# 可参见https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
pip install onnxruntime-gpu
```
```python
# 在 Python 中指定 GPU(必须在导入 kitty_doc 之前设置)
import os
# 使用默认 GPU(cuda:0)
os.environ['MINERU_DEVICE_MODE'] = "cuda"
# 或指定 GPU 编号,例如使用第二块 GPU(cuda:1)
os.environ['MINERU_DEVICE_MODE'] = "cuda:1"
```

---

## 📋 使用示例

- [代码示例](./demo/demo.py)

- [参数介绍](./docs/analyze_param.md)

---

## 模型下载
不指定模型路径,初次运行时,会自动下载
- [KittyDoc 模型集(版面/公式/表格)](https://www.modelscope.cn/models/hzkitty/KittyDoc)  
- [RapidOCR 模型](https://www.modelscope.cn/models/RapidAI/RapidOCR)  
- [部分表格模型RapidTable](https://www.modelscope.cn/models/RapidAI/RapidTable) 

---

## 📌 TODO

- [x] 表格非OCR文本提取
- [x] 跨页表格合并
- [x] 复选框识别,使用opencv(默认关闭、opencv识别存在误检)
- [ ] 复选框识别,使用模型
- [ ] 四方向分类旋转表格解析 rapid_orientation
- [ ] 表格内公式提取
- [ ] 表格内图片提取
- [ ] 公式识别支持gpu
- [ ] 版面、表格、公式支持openvino
- [ ] KittyDoc4j(Java版本)


## 🙏 致谢

- [MinerU](https://github.com/opendatalab/MinerU)
- [PaddleOCR & PP-StructureV3](https://github.com/PaddlePaddle/PaddleOCR)
- [RapidOCR](https://github.com/RapidAI/RapidOCR)

## ⚖️ 开源许可

基于 [MinerU](https://github.com/opendatalab/MinerU) 改造而来,已**移除原项目中的 YOLO 模型**,并替换为 **PP-StructureV3 系列 ONNX 模型**。  
由于已移除 AGPL 授权的 YOLO 模型部分,本项目整体不再受 AGPL 约束。

该项目采用 [Apache 2.0 license](LICENSE) 开源许可证。

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kitty-doc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.10",
    "maintainer_email": null,
    "keywords": "kitty-doc, kitty_doc, onnx, convert, pdf, markdown",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8a/79/5e2241de7018c0b799b9c582edf41df1922a3470cffe2eb5e1531d295265/kitty_doc-0.1.0.tar.gz",
    "platform": null,
    "description": "# KittyDoc \u2013 \u9ad8\u901f\u6587\u6863\u89e3\u6790\u4ea7\u7ebf\r\n\r\n**\u57fa\u4e8e [Mineru](https://github.com/opendatalab/MinerU) \u4e8c\u6b21\u5f00\u53d1\uff0c\u79fb\u9664 VLM\uff0c\u4e13\u6ce8\u4e8e Pipeline \u4ea7\u7ebf\u4e0b\u7684\u9ad8\u6548\u6587\u6863\u89e3\u6790\uff0c\u5728 CPU \u4e0a\u4e5f\u80fd\u4fdd\u6301\u4e0d\u9519\u7684\u89e3\u6790\u901f\u5ea6\u3002**\r\n\r\n## \ud83d\ude3a \u9879\u76ee\u4ecb\u7ecd\r\n\r\nKittyDoc \u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u4e13\u6ce8\u4e8e\u6587\u6863\u89e3\u6790\u7684\u5f00\u6e90\u6846\u67b6\uff0c\u652f\u6301 **OCR\u3001\u7248\u9762\u5206\u6790\u3001\u516c\u5f0f\u8bc6\u522b\u3001\u8868\u683c\u8bc6\u522b\u548c\u9605\u8bfb\u987a\u5e8f\u6062\u590d** \u7b49\u591a\u79cd\u529f\u80fd\u3002  \r\n\u4e0e\u539f\u6846\u67b6\u76f8\u6bd4\uff0c\u672c\u9879\u76ee\u4f7f\u7528 **[PP-StructureV3](https://www.paddleocr.ai/main/version3.x/pipeline_usage/PP-StructureV3.html) \u7cfb\u5217\u6a21\u578b**\uff0c\u5e76\u5b8c\u5168 **\u53bb\u9664\u5bf9 Paddle \u7684\u4f9d\u8d56**\uff0c\u6240\u6709\u6a21\u578b\u5747\u5df2\u8f6c\u6362\u4e3a ONNX\uff0c\u53ef\u76f4\u63a5\u901a\u8fc7 **ONNX Runtime** \u6216 **OpenVINO**\uff08\u90e8\u5206\u6a21\u578b\uff09\u8fdb\u884c\u9ad8\u6548\u63a8\u7406\u3002\r\n\r\n---\r\n\r\n> \u2728\u5982\u679c\u8be5\u9879\u76ee\u5bf9\u60a8\u6709\u5e2e\u52a9\uff0c\u60a8\u7684star\u662f\u6211\u4e0d\u65ad\u4f18\u5316\u7684\u52a8\u529b\uff01\uff01\uff01\r\n>\r\n> - [github\u70b9\u51fb\u524d\u5f80](https://github.com/hzkitty/KittyDoc)\r\n> - [gitee\u70b9\u51fb\u524d\u5f80](https://gitee.com/hzkitty/KittyDoc)\r\n\r\n## \ud83d\udc4f \u9879\u76ee\u7279\u70b9\r\n\r\n- **OCR \u8bc6\u522b**\r\n  - \u4f7f\u7528 [RapidOCR](https://github.com/RapidAI/RapidOCR) \u652f\u6301\u591a\u79cd\u63a8\u7406\u5f15\u64ce\r\n  - CPU \u4e0b\u9ed8\u8ba4\u4f7f\u7528 OpenVINO\uff0cGPU \u4e0b\u9ed8\u8ba4\u4f7f\u7528 torch\r\n  \r\n- **\u7248\u9762\u8bc6\u522b**\r\n  - \u6a21\u578b\u4f7f\u7528 `PP-DocLayout` \u7cfb\u5217 ONNX \u6a21\u578b\uff08plus-L\u3001L\u3001M\u3001S\uff09\r\n    - **PP-DocLayout_plus-L**\uff1a\u6548\u679c\u6700\u597d\uff0c\u901f\u5ea6\u7a0d\u6162 \r\n    - **PP-DocLayout-L**\uff1a\u901f\u5ea6\u5feb\uff0c\u6548\u679c\u4e5f\u4e0d\u9519\uff0c\u9ed8\u8ba4\u4f7f\u7528  \r\n    - **PP-DocLayout-S**\uff1a\u901f\u5ea6\u6781\u5feb\uff0c\u53ef\u80fd\u5b58\u5728\u90e8\u5206\u6f0f\u68c0\r\n\r\n- **\u516c\u5f0f\u8bc6\u522b**\r\n  - \u4f7f\u7528 `PP-FormulaNet_plus` \u7cfb\u5217 ONNX \u6a21\u578b\uff08L\u3001M\u3001S\uff09\r\n    - **PP-FormulaNet_plus-L**\uff1a\u901f\u5ea6\u6162  \r\n    - **PP-FormulaNet_plus-S**\uff1a\u901f\u5ea6\u6700\u5feb\uff0c\u9ed8\u8ba4\u4f7f\u7528  \r\n  - \u652f\u6301\u914d\u7f6e\u53ea\u8bc6\u522b\u884c\u95f4\u516c\u5f0f\r\n  - cuda\u73af\u5883\u9ed8\u8ba4\u4e0d\u4f7f\u7528gpu\uff0c\u516c\u5f0f\u6a21\u578bonnx gpu\u63a8\u7406\u4f1a\u62a5\u9519\uff0c\u6682\u65f6\u65e0\u4eba\u89e3\u51b3 [PaddleOCR/issues/15125](https://github.com/PaddlePaddle/PaddleOCR/issues/15125), [PaddleX/issues/4238](https://github.com/PaddlePaddle/PaddleX/issues/4238), [Paddle2ONNX/issues/1593](https://github.com/PaddlePaddle/Paddle2ONNX/issues/1593)\r\n\r\n- **\u8868\u683c\u8bc6\u522b**\r\n  - \u57fa\u4e8e [rapid_table_self](kitty_doc%2Fmodel%2Ftable%2Frapid_table_self) \u589e\u5f3a\uff0c\u5728\u539f\u6709\u57fa\u7840\u4e0a\u589e\u5f3a\u4e3a\u591a\u6a21\u578b\u4e32\u8054\u65b9\u6848\uff1a  \r\n    - **\u8868\u683c\u5206\u7c7b**\uff08\u533a\u5206\u6709\u7ebf/\u65e0\u7ebf\u8868\u683c\uff09\r\n    - **SLANeXt** \u7cfb\u5217 \u8868\u7ed3\u6784\u8bc6\u522b + \u5355\u5143\u683c\u68c0\u6d4b\r\n    - **[\u6709\u7ebf\u8868\u683c\u8bc6\u522bUNET](https://github.com/RapidAI/TableStructureRec)** + SLANET_plus/UNITABLE\uff08\u4f5c\u4e3a\u65e0\u7ebf\u8868\u683c\u8bc6\u522b\uff09\r\n\r\n- **\u9605\u8bfb\u987a\u5e8f\u6062\u590d**\r\n  - \u4f7f\u7528 PP-StructureV3 \u9605\u8bfb\u987a\u5e8f `xycut++` \u7b97\u6cd5\u7b80\u5316\r\n  - \u901f\u5ea6\u5feb\u4e14\u9605\u8bfb\u987a\u5e8f\u6062\u590d\u6548\u679c\u4e0d\u9519\r\n\r\n- **\u63a8\u7406\u65b9\u5f0f**\r\n  - \u6240\u6709\u6a21\u578b\u901a\u8fc7 ONNXRuntime \u63a8\u7406\uff0cOCR\u53ef\u914d\u7f6e\u5176\u4ed6\u63a8\u7406\u5f15\u64ce\r\n  - \u9664\u4e86 OCR \u548c PP-DocLayout-M/S \u6a21\u578b\uff0cOpenVINO\u63a8\u7406\u4f1a\u62a5\u9519\uff0c\u6682\u65f6\u96be\u4ee5\u89e3\u51b3\u3002[PaddleOCR/issues/16277](https://github.com/PaddlePaddle/PaddleOCR/issues/16277)\r\n---\r\n\r\n## \ud83d\udee0\ufe0f \u5b89\u88c5KittyDoc\r\n\r\n#### \u4f7f\u7528pip\u5b89\u88c5KittyDoc\r\n```bash\r\npip install kitty-doc -i https://mirrors.aliyun.com/pypi/simple\r\n```\r\n\r\n#### \u901a\u8fc7\u6e90\u7801\u5b89\u88c5KittyDoc\r\n```bash\r\n# \u514b\u9686\u4ed3\u5e93\r\ngit clone https://github.com/hzkitty/KittyDoc.git\r\ncd KittyDoc\r\n\r\n# \u5b89\u88c5\u4f9d\u8d56\r\npip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple\r\n```\r\n#### \u4f7f\u7528gpu\u63a8\u7406\r\n```bash\r\n# \u5728\u5b89\u88c5\u5b8ckitty_doc\u4e4b\u540e\uff0c\u5378\u8f7dcpu\u7248\u7684onnxruntime\r\npip uninstall onnxruntime\r\n# \u8fd9\u91cc\u4e00\u5b9a\u8981\u786e\u5b9aonnxruntime-gpu\u4e0eGPU\u5bf9\u5e94\r\n# \u53ef\u53c2\u89c1https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements\r\npip install onnxruntime-gpu\r\n```\r\n```python\r\n# \u5728 Python \u4e2d\u6307\u5b9a GPU\uff08\u5fc5\u987b\u5728\u5bfc\u5165 kitty_doc \u4e4b\u524d\u8bbe\u7f6e\uff09\r\nimport os\r\n# \u4f7f\u7528\u9ed8\u8ba4 GPU\uff08cuda:0\uff09\r\nos.environ['MINERU_DEVICE_MODE'] = \"cuda\"\r\n# \u6216\u6307\u5b9a GPU \u7f16\u53f7\uff0c\u4f8b\u5982\u4f7f\u7528\u7b2c\u4e8c\u5757 GPU\uff08cuda:1\uff09\r\nos.environ['MINERU_DEVICE_MODE'] = \"cuda:1\"\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udccb \u4f7f\u7528\u793a\u4f8b\r\n\r\n- [\u4ee3\u7801\u793a\u4f8b](./demo/demo.py)\r\n\r\n- [\u53c2\u6570\u4ecb\u7ecd](./docs/analyze_param.md)\r\n\r\n---\r\n\r\n## \u6a21\u578b\u4e0b\u8f7d\r\n\u4e0d\u6307\u5b9a\u6a21\u578b\u8def\u5f84\uff0c\u521d\u6b21\u8fd0\u884c\u65f6\uff0c\u4f1a\u81ea\u52a8\u4e0b\u8f7d\r\n- [KittyDoc \u6a21\u578b\u96c6\uff08\u7248\u9762/\u516c\u5f0f/\u8868\u683c\uff09](https://www.modelscope.cn/models/hzkitty/KittyDoc)  \r\n- [RapidOCR \u6a21\u578b](https://www.modelscope.cn/models/RapidAI/RapidOCR)  \r\n- [\u90e8\u5206\u8868\u683c\u6a21\u578bRapidTable](https://www.modelscope.cn/models/RapidAI/RapidTable) \r\n\r\n---\r\n\r\n## \ud83d\udccc TODO\r\n\r\n- [x] \u8868\u683c\u975eOCR\u6587\u672c\u63d0\u53d6\r\n- [x] \u8de8\u9875\u8868\u683c\u5408\u5e76\r\n- [x] \u590d\u9009\u6846\u8bc6\u522b\uff0c\u4f7f\u7528opencv\uff08\u9ed8\u8ba4\u5173\u95ed\u3001opencv\u8bc6\u522b\u5b58\u5728\u8bef\u68c0\uff09\r\n- [ ] \u590d\u9009\u6846\u8bc6\u522b\uff0c\u4f7f\u7528\u6a21\u578b\r\n- [ ] \u56db\u65b9\u5411\u5206\u7c7b\u65cb\u8f6c\u8868\u683c\u89e3\u6790 rapid_orientation\r\n- [ ] \u8868\u683c\u5185\u516c\u5f0f\u63d0\u53d6\r\n- [ ] \u8868\u683c\u5185\u56fe\u7247\u63d0\u53d6\r\n- [ ] \u516c\u5f0f\u8bc6\u522b\u652f\u6301gpu\r\n- [ ] \u7248\u9762\u3001\u8868\u683c\u3001\u516c\u5f0f\u652f\u6301openvino\r\n- [ ] KittyDoc4j\uff08Java\u7248\u672c\uff09\r\n\r\n\r\n## \ud83d\ude4f \u81f4\u8c22\r\n\r\n- [MinerU](https://github.com/opendatalab/MinerU)\r\n- [PaddleOCR & PP-StructureV3](https://github.com/PaddlePaddle/PaddleOCR)\r\n- [RapidOCR](https://github.com/RapidAI/RapidOCR)\r\n\r\n## \u2696\ufe0f \u5f00\u6e90\u8bb8\u53ef\r\n\r\n\u57fa\u4e8e [MinerU](https://github.com/opendatalab/MinerU) \u6539\u9020\u800c\u6765\uff0c\u5df2**\u79fb\u9664\u539f\u9879\u76ee\u4e2d\u7684 YOLO \u6a21\u578b**\uff0c\u5e76\u66ff\u6362\u4e3a **PP-StructureV3 \u7cfb\u5217 ONNX \u6a21\u578b**\u3002  \r\n\u7531\u4e8e\u5df2\u79fb\u9664 AGPL \u6388\u6743\u7684 YOLO \u6a21\u578b\u90e8\u5206\uff0c\u672c\u9879\u76ee\u6574\u4f53\u4e0d\u518d\u53d7 AGPL \u7ea6\u675f\u3002\r\n\r\n\u8be5\u9879\u76ee\u91c7\u7528 [Apache 2.0 license](LICENSE) \u5f00\u6e90\u8bb8\u53ef\u8bc1\u3002\r\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "A practical tool for converting PDF to Markdown",
    "version": "0.1.0",
    "project_urls": {
        "Home": "https://github.com/hzkitty",
        "Repository": "https://github.com/hzkitty/KittyDoc"
    },
    "split_keywords": [
        "kitty-doc",
        " kitty_doc",
        " onnx",
        " convert",
        " pdf",
        " markdown"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d669430ddb4b9fef3492e1fa540d1422e883ea384180da41707caceacc602b0a",
                "md5": "ffe6c105ea54b81b543277d5b3144564",
                "sha256": "120ee4c28d17f6b307b0ca802ace59f2893e6e4bef6ed020df6f90863b192478"
            },
            "downloads": -1,
            "filename": "kitty_doc-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ffe6c105ea54b81b543277d5b3144564",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.10",
            "size": 1022719,
            "upload_time": "2025-09-16T18:15:00",
            "upload_time_iso_8601": "2025-09-16T18:15:00.179217Z",
            "url": "https://files.pythonhosted.org/packages/d6/69/430ddb4b9fef3492e1fa540d1422e883ea384180da41707caceacc602b0a/kitty_doc-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a795e2241de7018c0b799b9c582edf41df1922a3470cffe2eb5e1531d295265",
                "md5": "3270e1336de869e5202a1a1b0ab77dcd",
                "sha256": "76d529645fb9c2abc6b352748a1563bccdd1ecf101d674f2d0372b1de3c32129"
            },
            "downloads": -1,
            "filename": "kitty_doc-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3270e1336de869e5202a1a1b0ab77dcd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.10",
            "size": 972555,
            "upload_time": "2025-09-16T18:15:02",
            "upload_time_iso_8601": "2025-09-16T18:15:02.254802Z",
            "url": "https://files.pythonhosted.org/packages/8a/79/5e2241de7018c0b799b9c582edf41df1922a3470cffe2eb5e1531d295265/kitty_doc-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-16 18:15:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hzkitty",
    "github_project": "KittyDoc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "boto3",
            "specs": [
                [
                    ">=",
                    "1.28.43"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    ">=",
                    "0.6.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.6"
                ]
            ]
        },
        {
            "name": "pdfminer.six",
            "specs": [
                [
                    "==",
                    "20250506"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.67.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "pypdfium2",
            "specs": [
                [
                    ">=",
                    "4.30.0"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    ">=",
                    "5.6.0"
                ]
            ]
        },
        {
            "name": "reportlab",
            "specs": []
        },
        {
            "name": "pdftext",
            "specs": [
                [
                    ">=",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "json-repair",
            "specs": [
                [
                    ">=",
                    "0.46.2"
                ]
            ]
        },
        {
            "name": "fast-langdetect",
            "specs": [
                [
                    "<",
                    "0.3.0"
                ],
                [
                    ">=",
                    "0.2.3"
                ]
            ]
        },
        {
            "name": "scikit-image",
            "specs": [
                [
                    ">=",
                    "0.25.0"
                ],
                [
                    "<",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    "<",
                    "2"
                ],
                [
                    ">=",
                    "1.70.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.13.5"
                ],
                [
                    "<",
                    "5"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "<",
                    "2.11"
                ],
                [
                    ">=",
                    "2.7.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.10"
                ],
                [
                    "<",
                    "4"
                ]
            ]
        },
        {
            "name": "ftfy",
            "specs": [
                [
                    ">=",
                    "6.3.1"
                ],
                [
                    "<",
                    "7"
                ]
            ]
        },
        {
            "name": "shapely",
            "specs": [
                [
                    "<",
                    "3"
                ],
                [
                    ">=",
                    "2.0.7"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.6.0"
                ],
                [
                    "<",
                    "2.8.0"
                ]
            ]
        },
        {
            "name": "torchvision",
            "specs": []
        },
        {
            "name": "onnxruntime",
            "specs": [
                [
                    ">=",
                    "1.18.0"
                ]
            ]
        },
        {
            "name": "openvino",
            "specs": [
                [
                    ">=",
                    "2024.6.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    ">=",
                    "0.13.2"
                ]
            ]
        },
        {
            "name": "rapidocr",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ],
                [
                    "<=",
                    "3.3.0"
                ]
            ]
        }
    ],
    "lcname": "kitty-doc"
}
        
Elapsed time: 2.00416s