# Fast MRZ
![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)
![Downloads](https://static.pepy.tech/badge/fastmrz)
![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue?logo=python&logoColor=959DA5)
[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)
[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)
<a href="https://github.com/sivakumar-mahalingam/fastmrz/" target="_blank">
<img src="https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png" target="_blank" />
</a>
This repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.
**️Features:**
- Detects and extracts the MRZ region from document images
- Contour detection to accurately identify the MRZ area
- Custom trained models for ONNX and Tesseract
- Contains checksum logics for data validation
- Outputs the extracted MRZ region as text/json for further processing or analysis
## Built With
![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)
![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)
![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)
![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)
## Installation
1. Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable.
2. Install `fastmrz`
```bash
pip install fastmrz
```
This can be done through conda too if you prefer.
```bash
conda create -n fastmrz tesseract -c conda-forge
conda activate fastmrz
```
3. Copy `mrz.traineddata` in `tessdata` folder of [repo](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) to the tessdata folder in installed tesseract location
## Example
```Python
from fastmrz import FastMRZ
import json
fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg")
print("JSON:")
print(json.dumps(passport_mrz, indent=4))
print("\n")
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg", raw=True)
print("TEXT:")
print(passport_mrz)
```
**OUTPUT:**
```Console
JSON:
{
"mrz_type": "TD3",
"document_type": "P",
"country_code": "GBR",
"surname": "PUDARSAN",
"given_name": "HENERT",
"document_number": "707797979",
"nationality": "GBR",
"date_of_birth": "1995-05-20",
"sex": "M",
"date_of_expiry": "2017-04-22",
"status": "SUCCESS"
}
TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
```
## MRZ Wiki
<details>
<summary><b>MRZ Types & Format</b></summary>
The standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.
There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:
- TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.
Now, based on the example of a national passport, let us take a closer look at the MRZ composition.
![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)
</details>
![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)
## ToDo
- [ ] Test for **mrva** and **mrvb** documents
- [ ] Add `wiki` page
## License
Distributed under the AGPL-3.0 License. See `LICENSE` for more information.
## Show your support
Give a ⭐️ if <a href="https://github.com/sivakumar-mahalingam/fastmrz/">this</a> project helped you!
Raw data
{
"_id": null,
"home_page": "https://github.com/sivakumar-mahalingam/fastmrz",
"name": "fastmrz",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "fastmrz, mrz, image processing, image recognition, ocr, computer vision, text recognition, text detection, artificial intelligence, onnx",
"author": "Sivakumar Mahalingam",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/26/b3/b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba/fastmrz-1.3.tar.gz",
"platform": null,
"description": "# Fast MRZ\r\n\r\n![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)\r\n![Downloads](https://static.pepy.tech/badge/fastmrz)\r\n![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue?logo=python&logoColor=959DA5)\r\n[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)\r\n[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)\r\n\r\n<a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\" target=\"_blank\">\r\n <img src=\"https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png\" target=\"_blank\" />\r\n</a>\r\n\r\nThis repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.\r\n\r\n**\ufe0fFeatures:**\r\n\r\n- Detects and extracts the MRZ region from document images\r\n- Contour detection to accurately identify the MRZ area\r\n- Custom trained models for ONNX and Tesseract \r\n- Contains checksum logics for data validation\r\n- Outputs the extracted MRZ region as text/json for further processing or analysis\r\n\r\n\r\n## Built With\r\n\r\n![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)\r\n![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)\r\n![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)\r\n![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)\r\n\r\n## Installation\r\n\r\n\r\n1. Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable. \r\n\r\n2. Install `fastmrz`\r\n ```bash\r\n pip install fastmrz\r\n ```\r\n This can be done through conda too if you prefer.\r\n\r\n ```bash\r\n conda create -n fastmrz tesseract -c conda-forge\r\n conda activate fastmrz\r\n ```\r\n\r\n3. Copy `mrz.traineddata` in `tessdata` folder of [repo](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) to the tessdata folder in installed tesseract location\r\n\r\n## Example\r\n\r\n```Python\r\nfrom fastmrz import FastMRZ\r\nimport json\r\n\r\nfast_mrz = FastMRZ()\r\n# Pass file path of installed Tesseract OCR, incase if not added to PATH variable\r\n# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac\r\n# fast_mrz = FastMRZ(tesseract_path=r'C:\\\\Program Files\\\\Tesseract-OCR\\\\tesseract.exe') # Default path in Windows\r\npassport_mrz = fast_mrz.get_mrz(\"../data/passport_uk.jpg\")\r\nprint(\"JSON:\")\r\nprint(json.dumps(passport_mrz, indent=4))\r\n\r\nprint(\"\\n\")\r\n\r\npassport_mrz = fast_mrz.get_mrz(\"../data/passport_uk.jpg\", raw=True)\r\nprint(\"TEXT:\")\r\nprint(passport_mrz)\r\n```\r\n\r\n**OUTPUT:**\r\n```Console\r\nJSON:\r\n{\r\n \"mrz_type\": \"TD3\",\r\n \"document_type\": \"P\",\r\n \"country_code\": \"GBR\",\r\n \"surname\": \"PUDARSAN\",\r\n \"given_name\": \"HENERT\",\r\n \"document_number\": \"707797979\",\r\n \"nationality\": \"GBR\",\r\n \"date_of_birth\": \"1995-05-20\",\r\n \"sex\": \"M\",\r\n \"date_of_expiry\": \"2017-04-22\",\r\n \"status\": \"SUCCESS\"\r\n}\r\n\r\n\r\nTEXT:\r\nP<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\r\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00\r\n```\r\n\r\n## MRZ Wiki\r\n\r\n<details>\r\n <summary><b>MRZ Types & Format</b></summary>\r\n\r\nThe standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.\r\n\r\nThere are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:\r\n\r\n- TD-1 (e.g. citizen\u2019s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.\r\n- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B \u2014 e.g. Schengen visa): consists of 2 lines, 36 characters each.\r\n- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A \u2014 issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.\r\n\r\nNow, based on the example of a national passport, let us take a closer look at the MRZ composition.\r\n\r\n![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)\r\n\r\n</details>\r\n\r\n![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)\r\n\r\n## ToDo\r\n\r\n- [ ] Test for **mrva** and **mrvb** documents\r\n- [ ] Add `wiki` page\r\n\r\n## License\r\n\r\nDistributed under the AGPL-3.0 License. See `LICENSE` for more information.\r\n\r\n## Show your support\r\n\r\nGive a \u2b50\ufe0f if <a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\">this</a> project helped you!\r\n\r\n",
"bugtrack_url": null,
"license": "AGPLv3",
"summary": "Extracts the Machine Readable Zone (MRZ) data from document images",
"version": "1.3",
"project_urls": {
"Homepage": "https://github.com/sivakumar-mahalingam/fastmrz"
},
"split_keywords": [
"fastmrz",
" mrz",
" image processing",
" image recognition",
" ocr",
" computer vision",
" text recognition",
" text detection",
" artificial intelligence",
" onnx"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dc755d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c",
"md5": "a62f9ab894c8838d1b8525b721200922",
"sha256": "db3de592f7ee5dcc55bb9e68a0a2fee1809c259917c10bc27a392bcd35bcddae"
},
"downloads": -1,
"filename": "fastmrz-1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a62f9ab894c8838d1b8525b721200922",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 9771370,
"upload_time": "2024-10-12T15:00:17",
"upload_time_iso_8601": "2024-10-12T15:00:17.354011Z",
"url": "https://files.pythonhosted.org/packages/dc/75/5d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c/fastmrz-1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "26b3b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba",
"md5": "5e6982466b1a15d7e64cb1c2bfbd03c0",
"sha256": "0c57c9c497df849664ef8bff3b8c621f65c528c0378d12f60ed5ac8c267aab3e"
},
"downloads": -1,
"filename": "fastmrz-1.3.tar.gz",
"has_sig": false,
"md5_digest": "5e6982466b1a15d7e64cb1c2bfbd03c0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 4897432,
"upload_time": "2024-10-12T15:00:21",
"upload_time_iso_8601": "2024-10-12T15:00:21.138471Z",
"url": "https://files.pythonhosted.org/packages/26/b3/b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba/fastmrz-1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-12 15:00:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sivakumar-mahalingam",
"github_project": "fastmrz",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "fastmrz"
}