fastmrz


Namefastmrz JSON
Version 1.3 PyPI version JSON
download
home_pagehttps://github.com/sivakumar-mahalingam/fastmrz
SummaryExtracts the Machine Readable Zone (MRZ) data from document images
upload_time2024-10-12 15:00:21
maintainerNone
docs_urlNone
authorSivakumar Mahalingam
requires_python>=3.8
licenseAGPLv3
keywords fastmrz mrz image processing image recognition ocr computer vision text recognition text detection artificial intelligence onnx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Fast MRZ

![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)
![Downloads](https://static.pepy.tech/badge/fastmrz)
![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue?logo=python&logoColor=959DA5)
[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)
[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)

<a href="https://github.com/sivakumar-mahalingam/fastmrz/" target="_blank">
    <img src="https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png" target="_blank" />
</a>

This repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.

**️Features:**

- Detects and extracts the MRZ region from document images
- Contour detection to accurately identify the MRZ area
- Custom trained models for ONNX and Tesseract 
- Contains checksum logics for data validation
- Outputs the extracted MRZ region as text/json for further processing or analysis


## Built With

![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)
![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)
![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)
![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)

## Installation


1. Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable. 

2. Install `fastmrz`
    ```bash
    pip install fastmrz
    ```
   This can be done through conda too if you prefer.

     ```bash
     conda create -n fastmrz tesseract -c conda-forge
     conda activate fastmrz
     ```

3. Copy `mrz.traineddata` in `tessdata` folder of [repo](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) to the tessdata folder in installed tesseract location

## Example

```Python
from fastmrz import FastMRZ
import json

fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg")
print("JSON:")
print(json.dumps(passport_mrz, indent=4))

print("\n")

passport_mrz = fast_mrz.get_mrz("../data/passport_uk.jpg", raw=True)
print("TEXT:")
print(passport_mrz)
```

**OUTPUT:**
```Console
JSON:
{
    "mrz_type": "TD3",
    "document_type": "P",
    "country_code": "GBR",
    "surname": "PUDARSAN",
    "given_name": "HENERT",
    "document_number": "707797979",
    "nationality": "GBR",
    "date_of_birth": "1995-05-20",
    "sex": "M",
    "date_of_expiry": "2017-04-22",
    "status": "SUCCESS"
}


TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
```

## MRZ Wiki

<details>
    <summary><b>MRZ Types & Format</b></summary>

The standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.

There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:

- TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.

Now, based on the example of a national passport, let us take a closer look at the MRZ composition.

![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)

</details>

![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)

## ToDo

- [ ] Test for **mrva** and **mrvb** documents
- [ ] Add `wiki` page

## License

Distributed under the AGPL-3.0 License. See `LICENSE` for more information.

## Show your support

Give a ⭐️ if <a href="https://github.com/sivakumar-mahalingam/fastmrz/">this</a> project helped you!


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sivakumar-mahalingam/fastmrz",
    "name": "fastmrz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "fastmrz, mrz, image processing, image recognition, ocr, computer vision, text recognition, text detection, artificial intelligence, onnx",
    "author": "Sivakumar Mahalingam",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/26/b3/b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba/fastmrz-1.3.tar.gz",
    "platform": null,
    "description": "# Fast MRZ\r\n\r\n![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)\r\n![Downloads](https://static.pepy.tech/badge/fastmrz)\r\n![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue?logo=python&logoColor=959DA5)\r\n[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)\r\n[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)\r\n\r\n<a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\" target=\"_blank\">\r\n    <img src=\"https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png\" target=\"_blank\" />\r\n</a>\r\n\r\nThis repository extracts the Machine Readable Zone (MRZ) from document images. The MRZ typically contains important information such as the document holder's name, nationality, document number, date of birth, etc.\r\n\r\n**\ufe0fFeatures:**\r\n\r\n- Detects and extracts the MRZ region from document images\r\n- Contour detection to accurately identify the MRZ area\r\n- Custom trained models for ONNX and Tesseract \r\n- Contains checksum logics for data validation\r\n- Outputs the extracted MRZ region as text/json for further processing or analysis\r\n\r\n\r\n## Built With\r\n\r\n![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)\r\n![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)\r\n![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)\r\n![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)\r\n\r\n## Installation\r\n\r\n\r\n1. Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable. \r\n\r\n2. Install `fastmrz`\r\n    ```bash\r\n    pip install fastmrz\r\n    ```\r\n   This can be done through conda too if you prefer.\r\n\r\n     ```bash\r\n     conda create -n fastmrz tesseract -c conda-forge\r\n     conda activate fastmrz\r\n     ```\r\n\r\n3. Copy `mrz.traineddata` in `tessdata` folder of [repo](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) to the tessdata folder in installed tesseract location\r\n\r\n## Example\r\n\r\n```Python\r\nfrom fastmrz import FastMRZ\r\nimport json\r\n\r\nfast_mrz = FastMRZ()\r\n# Pass file path of installed Tesseract OCR, incase if not added to PATH variable\r\n# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac\r\n# fast_mrz = FastMRZ(tesseract_path=r'C:\\\\Program Files\\\\Tesseract-OCR\\\\tesseract.exe') # Default path in Windows\r\npassport_mrz = fast_mrz.get_mrz(\"../data/passport_uk.jpg\")\r\nprint(\"JSON:\")\r\nprint(json.dumps(passport_mrz, indent=4))\r\n\r\nprint(\"\\n\")\r\n\r\npassport_mrz = fast_mrz.get_mrz(\"../data/passport_uk.jpg\", raw=True)\r\nprint(\"TEXT:\")\r\nprint(passport_mrz)\r\n```\r\n\r\n**OUTPUT:**\r\n```Console\r\nJSON:\r\n{\r\n    \"mrz_type\": \"TD3\",\r\n    \"document_type\": \"P\",\r\n    \"country_code\": \"GBR\",\r\n    \"surname\": \"PUDARSAN\",\r\n    \"given_name\": \"HENERT\",\r\n    \"document_number\": \"707797979\",\r\n    \"nationality\": \"GBR\",\r\n    \"date_of_birth\": \"1995-05-20\",\r\n    \"sex\": \"M\",\r\n    \"date_of_expiry\": \"2017-04-22\",\r\n    \"status\": \"SUCCESS\"\r\n}\r\n\r\n\r\nTEXT:\r\nP<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\r\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00\r\n```\r\n\r\n## MRZ Wiki\r\n\r\n<details>\r\n    <summary><b>MRZ Types & Format</b></summary>\r\n\r\nThe standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.\r\n\r\nThere are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:\r\n\r\n- TD-1 (e.g. citizen\u2019s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.\r\n- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B \u2014 e.g. Schengen visa): consists of 2 lines, 36 characters each.\r\n- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A \u2014 issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.\r\n\r\nNow, based on the example of a national passport, let us take a closer look at the MRZ composition.\r\n\r\n![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)\r\n\r\n</details>\r\n\r\n![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)\r\n\r\n## ToDo\r\n\r\n- [ ] Test for **mrva** and **mrvb** documents\r\n- [ ] Add `wiki` page\r\n\r\n## License\r\n\r\nDistributed under the AGPL-3.0 License. See `LICENSE` for more information.\r\n\r\n## Show your support\r\n\r\nGive a \u2b50\ufe0f if <a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\">this</a> project helped you!\r\n\r\n",
    "bugtrack_url": null,
    "license": "AGPLv3",
    "summary": "Extracts the Machine Readable Zone (MRZ) data from document images",
    "version": "1.3",
    "project_urls": {
        "Homepage": "https://github.com/sivakumar-mahalingam/fastmrz"
    },
    "split_keywords": [
        "fastmrz",
        " mrz",
        " image processing",
        " image recognition",
        " ocr",
        " computer vision",
        " text recognition",
        " text detection",
        " artificial intelligence",
        " onnx"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc755d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c",
                "md5": "a62f9ab894c8838d1b8525b721200922",
                "sha256": "db3de592f7ee5dcc55bb9e68a0a2fee1809c259917c10bc27a392bcd35bcddae"
            },
            "downloads": -1,
            "filename": "fastmrz-1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a62f9ab894c8838d1b8525b721200922",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9771370,
            "upload_time": "2024-10-12T15:00:17",
            "upload_time_iso_8601": "2024-10-12T15:00:17.354011Z",
            "url": "https://files.pythonhosted.org/packages/dc/75/5d7f68cd9e6329c1f396c38bc8b7fa3d45d71221a306fa1636b26ae86b6c/fastmrz-1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26b3b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba",
                "md5": "5e6982466b1a15d7e64cb1c2bfbd03c0",
                "sha256": "0c57c9c497df849664ef8bff3b8c621f65c528c0378d12f60ed5ac8c267aab3e"
            },
            "downloads": -1,
            "filename": "fastmrz-1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "5e6982466b1a15d7e64cb1c2bfbd03c0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 4897432,
            "upload_time": "2024-10-12T15:00:21",
            "upload_time_iso_8601": "2024-10-12T15:00:21.138471Z",
            "url": "https://files.pythonhosted.org/packages/26/b3/b53dfca92da8116e9bfe9e9b564ac2b1f6d14c56808bc17017b3f14d39ba/fastmrz-1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-12 15:00:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sivakumar-mahalingam",
    "github_project": "fastmrz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "fastmrz"
}
        
Elapsed time: 4.95322s