phocr


Namephocr JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/puhuilab/phocr
SummaryHigh-Performance OCR Toolkit
upload_time2025-07-13 13:40:54
maintainerNone
docs_urlNone
authorPuHui Lab
requires_python>=3.8
licenseNone
keywords ocr text recognition computer vision deep learning
VCS
bugtrack_url
requirements pyclipper opencv_python numpy six Shapely PyYAML Pillow tqdm datasets omegaconf colorlog onnx Levenshtein regex
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PHOCR: High-Performance OCR Toolkit

[English](README.md) | [简体中文](README_CN.md)

PHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. **PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.**

## Motivation

Current token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million — an order of magnitude improvement.

## Features

- **Custom Recognition Model**: **PH-OCRv1** achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English.
- **Multi-language Support**: Chinese, English, Japanese, Korean, Russian, and more
- **Rich Vocabulary**: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292.
- **High Performance**: Optimized inference engine with ONNX Runtime support
- **Easy Integration**: Simple Python API for quick deployment
- **Cross-platform**: Support for CPU and CUDA

## Visualization

![Visualization](./vis.gif)

## Installation

```bash
pip install phocr
```

## Quick Start

```python
from phocr import PHOCR

# Initialize OCR engine
engine = PHOCR()

# Perform OCR on image
result = engine("path/to/image.jpg")
print(result)

# Visualize results
result.vis("output.jpg")
print(result.to_markdown())
```

## Benchmarks

We conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. **Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.**

### Overall Performance Comparison

<table style="width: 90%; margin: auto; border-collapse: collapse; font-size: small;">
  <thead>
    <tr>
      <th rowspan="2">Model</th>
      <th colspan="4">ZH & EN<br><span style="font-weight: normal; font-size: x-small;">CER ↓</span></th>
      <th colspan="2">JP<br><span style="font-weight: normal; font-size: x-small;">CER ↓</span></th>
      <th colspan="2">KO<br><span style="font-weight: normal; font-size: x-small;">CER ↓</span></th>
      <th colspan="1">RU<br><span style="font-weight: normal; font-size: x-small;">CER ↓</span></th>
    </tr>
    <tr>
      <th><i>English</i></th>
      <th><i>Simplified Chinese</i></th>
      <th><i>EN CH Mixed</i></th>
      <th><i>Traditional Chinese</i></th>
      <th><i>Document</i></th>
      <th><i>Scene</i></th>
      <th><i>Document</i></th>
      <th><i>Scene</i></th>
      <th><i>Document</i></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>PHOCR</td>
      <td><strong>0.0008</strong></td>
      <td><strong>0.0057</strong></td>
      <td><strong>0.0171</strong></td>
      <td><strong>0.0145</strong></td>
      <td><strong>0.0039</strong></td>
      <td><strong>0.0197</strong></td>
      <td><strong>0.0050</strong></td>
      <td><strong>0.0255</strong></td>
      <td><strong>0.0046</strong></td>
    </tr>
    <tr>
      <td>Baidu</td>
      <td>0.0014</td>
      <td>0.0069</td>
      <td>0.0354</td>
      <td>0.0431</td>
      <td>0.0222</td>
      <td>0.0607</td>
      <td>0.0238</td>
      <td>0.212</td>
      <td>0.0786</td>
    </tr>
    <tr>
      <td>Ali</td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>0.0272</td>
      <td>0.0564</td>
      <td>0.0159</td>
      <td>0.102</td>
      <td>0.0616</td>
    </tr>
  </tbody>
</table>


Notice

- baidu: [Baidu Accurate API](https://ai.baidu.com/tech/ocr/general)
- Ali: [Aliyun API](https://help.aliyun.com/zh/ocr/product-overview/recognition-of-characters-in-languages-except-for-chinese-and-english-1)
- CER: the total edit distance divided by the total number of characters in the ground truth.


## Advanced Usage

With global KV cache enabled, we implement a simple version using PyTorch (CUDA). When running with torch (CUDA), you can enable caching by setting `use_cache=True` in `ORTSeq2Seq(...)`, which also allows for larger batch sizes.

### Language-specific Configuration

See [demo.py](./demo.py) for more examples.

## Evaluation & Benchmarking

PHOCR provides comprehensive benchmarking tools to evaluate model performance across different languages and scenarios.

### Quick Benchmark

Run the complete benchmark pipeline:
```bash
sh benchmark/run_recognition.sh
```

Calculate Character Error Rate (CER) for model predictions:
```bash
sh benchmark/run_score.sh
```

### Benchmark Datasets

PHOCR uses standardized benchmark datasets for fair comparison:

- **zh_en_rec_bench** [Chinese & English mixed text recognition](https://huggingface.co/datasets/puhuilab/zh_en_rec_bench)
- **jp_rec_bench** [Japanese text recognition](https://huggingface.co/datasets/puhuilab/jp_rec_bench)
- **ko_rec_bench** [Korean text recognition](https://huggingface.co/datasets/puhuilab/ko_rec_bench)
- **ru_rec_bench** [Russian text recognition](https://huggingface.co/datasets/puhuilab/ru_rec_bench)

## Further Improvements

- Character error rate (CER), including punctuation, can be further reduced through additional normalization of the training corpus.
- Text detection accuracy can be further enhanced by employing a more advanced detection framework.

## Contributing

We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.

## Support

For questions and support, please open an issue on GitHub or contact the maintainers.

## Acknowledgements

Many thanks to [RapidOCR](https://github.com/RapidAI/RapidOCR) for detection and main framework.

## License

- This project is released under the Apache 2.0 license
- The copyright of the OCR detection and classification model is held by Baidu
- The PHOCR recognition models are under the modified MIT License - see the [LICENSE](./LICENSE) file for details

## Citation

If you use PHOCR in your research, please cite:

```bibtex
@misc{phocr2025,
  title={PHOCR: High-Performance OCR Toolkit},
  author={PuHui Lab},
  year={2025},
  url={https://github.com/puhuilab/phocr}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/puhuilab/phocr",
    "name": "phocr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ocr, text recognition, computer vision, deep learning",
    "author": "PuHui Lab",
    "author_email": "contact@puhuilab.com",
    "download_url": "https://files.pythonhosted.org/packages/dd/05/2d63f6e96c66520b97b4d1e85d5f1da7c6d761dba60b70bc8e44410d4f14/phocr-1.0.0.tar.gz",
    "platform": null,
    "description": "# PHOCR: High-Performance OCR Toolkit\n\n[English](README.md) | [\u7b80\u4f53\u4e2d\u6587](README_CN.md)\n\nPHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. **PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.**\n\n## Motivation\n\nCurrent token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million \u2014 an order of magnitude improvement.\n\n## Features\n\n- **Custom Recognition Model**: **PH-OCRv1** achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English.\n- **Multi-language Support**: Chinese, English, Japanese, Korean, Russian, and more\n- **Rich Vocabulary**: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292.\n- **High Performance**: Optimized inference engine with ONNX Runtime support\n- **Easy Integration**: Simple Python API for quick deployment\n- **Cross-platform**: Support for CPU and CUDA\n\n## Visualization\n\n![Visualization](./vis.gif)\n\n## Installation\n\n```bash\npip install phocr\n```\n\n## Quick Start\n\n```python\nfrom phocr import PHOCR\n\n# Initialize OCR engine\nengine = PHOCR()\n\n# Perform OCR on image\nresult = engine(\"path/to/image.jpg\")\nprint(result)\n\n# Visualize results\nresult.vis(\"output.jpg\")\nprint(result.to_markdown())\n```\n\n## Benchmarks\n\nWe conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. **Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.**\n\n### Overall Performance Comparison\n\n<table style=\"width: 90%; margin: auto; border-collapse: collapse; font-size: small;\">\n  <thead>\n    <tr>\n      <th rowspan=\"2\">Model</th>\n      <th colspan=\"4\">ZH & EN<br><span style=\"font-weight: normal; font-size: x-small;\">CER \u2193</span></th>\n      <th colspan=\"2\">JP<br><span style=\"font-weight: normal; font-size: x-small;\">CER \u2193</span></th>\n      <th colspan=\"2\">KO<br><span style=\"font-weight: normal; font-size: x-small;\">CER \u2193</span></th>\n      <th colspan=\"1\">RU<br><span style=\"font-weight: normal; font-size: x-small;\">CER \u2193</span></th>\n    </tr>\n    <tr>\n      <th><i>English</i></th>\n      <th><i>Simplified Chinese</i></th>\n      <th><i>EN CH Mixed</i></th>\n      <th><i>Traditional Chinese</i></th>\n      <th><i>Document</i></th>\n      <th><i>Scene</i></th>\n      <th><i>Document</i></th>\n      <th><i>Scene</i></th>\n      <th><i>Document</i></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>PHOCR</td>\n      <td><strong>0.0008</strong></td>\n      <td><strong>0.0057</strong></td>\n      <td><strong>0.0171</strong></td>\n      <td><strong>0.0145</strong></td>\n      <td><strong>0.0039</strong></td>\n      <td><strong>0.0197</strong></td>\n      <td><strong>0.0050</strong></td>\n      <td><strong>0.0255</strong></td>\n      <td><strong>0.0046</strong></td>\n    </tr>\n    <tr>\n      <td>Baidu</td>\n      <td>0.0014</td>\n      <td>0.0069</td>\n      <td>0.0354</td>\n      <td>0.0431</td>\n      <td>0.0222</td>\n      <td>0.0607</td>\n      <td>0.0238</td>\n      <td>0.212</td>\n      <td>0.0786</td>\n    </tr>\n    <tr>\n      <td>Ali</td>\n      <td>-</td>\n      <td>-</td>\n      <td>-</td>\n      <td>-</td>\n      <td>0.0272</td>\n      <td>0.0564</td>\n      <td>0.0159</td>\n      <td>0.102</td>\n      <td>0.0616</td>\n    </tr>\n  </tbody>\n</table>\n\n\nNotice\n\n- baidu: [Baidu Accurate API](https://ai.baidu.com/tech/ocr/general)\n- Ali: [Aliyun API](https://help.aliyun.com/zh/ocr/product-overview/recognition-of-characters-in-languages-except-for-chinese-and-english-1)\n- CER: the total edit distance divided by the total number of characters in the ground truth.\n\n\n## Advanced Usage\n\nWith global KV cache enabled, we implement a simple version using PyTorch (CUDA). When running with torch (CUDA), you can enable caching by setting `use_cache=True` in `ORTSeq2Seq(...)`, which also allows for larger batch sizes.\n\n### Language-specific Configuration\n\nSee [demo.py](./demo.py) for more examples.\n\n## Evaluation & Benchmarking\n\nPHOCR provides comprehensive benchmarking tools to evaluate model performance across different languages and scenarios.\n\n### Quick Benchmark\n\nRun the complete benchmark pipeline:\n```bash\nsh benchmark/run_recognition.sh\n```\n\nCalculate Character Error Rate (CER) for model predictions:\n```bash\nsh benchmark/run_score.sh\n```\n\n### Benchmark Datasets\n\nPHOCR uses standardized benchmark datasets for fair comparison:\n\n- **zh_en_rec_bench** [Chinese & English mixed text recognition](https://huggingface.co/datasets/puhuilab/zh_en_rec_bench)\n- **jp_rec_bench** [Japanese text recognition](https://huggingface.co/datasets/puhuilab/jp_rec_bench)\n- **ko_rec_bench** [Korean text recognition](https://huggingface.co/datasets/puhuilab/ko_rec_bench)\n- **ru_rec_bench** [Russian text recognition](https://huggingface.co/datasets/puhuilab/ru_rec_bench)\n\n## Further Improvements\n\n- Character error rate (CER), including punctuation, can be further reduced through additional normalization of the training corpus.\n- Text detection accuracy can be further enhanced by employing a more advanced detection framework.\n\n## Contributing\n\nWe welcome contributions! Please feel free to submit issues, feature requests, or pull requests.\n\n## Support\n\nFor questions and support, please open an issue on GitHub or contact the maintainers.\n\n## Acknowledgements\n\nMany thanks to [RapidOCR](https://github.com/RapidAI/RapidOCR) for detection and main framework.\n\n## License\n\n- This project is released under the Apache 2.0 license\n- The copyright of the OCR detection and classification model is held by Baidu\n- The PHOCR recognition models are under the modified MIT License - see the [LICENSE](./LICENSE) file for details\n\n## Citation\n\nIf you use PHOCR in your research, please cite:\n\n```bibtex\n@misc{phocr2025,\n  title={PHOCR: High-Performance OCR Toolkit},\n  author={PuHui Lab},\n  year={2025},\n  url={https://github.com/puhuilab/phocr}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "High-Performance OCR Toolkit",
    "version": "1.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/puhuilab/phocr/issues",
        "Documentation": "https://github.com/puhuilab/phocr#readme",
        "Homepage": "https://github.com/puhuilab/phocr",
        "Source": "https://github.com/puhuilab/phocr"
    },
    "split_keywords": [
        "ocr",
        " text recognition",
        " computer vision",
        " deep learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d65748d0ba7a617db99bc27a50e4229fcfca958aa41055a8c859b883bdff1ff2",
                "md5": "1a720a7c91f70161868a0ef865c7a8f9",
                "sha256": "4b5dabe92416930cb377b501ec1017842f6af9874b38bd28ebf34d4b092db8cf"
            },
            "downloads": -1,
            "filename": "phocr-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1a720a7c91f70161868a0ef865c7a8f9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 51188,
            "upload_time": "2025-07-13T13:40:52",
            "upload_time_iso_8601": "2025-07-13T13:40:52.740205Z",
            "url": "https://files.pythonhosted.org/packages/d6/57/48d0ba7a617db99bc27a50e4229fcfca958aa41055a8c859b883bdff1ff2/phocr-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd052d63f6e96c66520b97b4d1e85d5f1da7c6d761dba60b70bc8e44410d4f14",
                "md5": "2c0ac6bcc39dd3acf36ac6743825b9be",
                "sha256": "052c11bb6e9d4a295ce2cdc28a5ebc0e4a23f269fbe42ec03abd089d64ba647a"
            },
            "downloads": -1,
            "filename": "phocr-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2c0ac6bcc39dd3acf36ac6743825b9be",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 45521,
            "upload_time": "2025-07-13T13:40:54",
            "upload_time_iso_8601": "2025-07-13T13:40:54.600487Z",
            "url": "https://files.pythonhosted.org/packages/dd/05/2d63f6e96c66520b97b4d1e85d5f1da7c6d761dba60b70bc8e44410d4f14/phocr-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 13:40:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "puhuilab",
    "github_project": "phocr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pyclipper",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "opencv_python",
            "specs": [
                [
                    ">=",
                    "4.5.1.48"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.5"
                ],
                [
                    "<=",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    ">=",
                    "1.15.0"
                ]
            ]
        },
        {
            "name": "Shapely",
            "specs": [
                [
                    "!=",
                    "2.0.4"
                ],
                [
                    ">=",
                    "1.7.1"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "datasets",
            "specs": []
        },
        {
            "name": "omegaconf",
            "specs": []
        },
        {
            "name": "colorlog",
            "specs": []
        },
        {
            "name": "onnx",
            "specs": []
        },
        {
            "name": "Levenshtein",
            "specs": []
        },
        {
            "name": "regex",
            "specs": []
        }
    ],
    "lcname": "phocr"
}
        
Elapsed time: 1.36580s