fastmrz


Namefastmrz JSON
Version 2.1.1 PyPI version JSON
download
home_pagehttps://github.com/sivakumar-mahalingam/fastmrz/
SummaryExtracts the Machine Readable Zone (MRZ) data from document images
upload_time2025-02-19 02:49:24
maintainerNone
docs_urlNone
authorSivakumar Mahalingam
requires_python>=3.8
licenseAGPLv3
keywords fastmrz mrz image processing image recognition ocr computer vision text recognition text detection artificial intelligence onnx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Fast MRZ

<div align="center">

[![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)](https://github.com/sivakumar-mahalingam/fastmrz/blob/main/LICENSE)
[![Downloads](https://static.pepy.tech/badge/fastmrz)](https://pypistats.org/packages/fastmrz)
![Python](https://img.shields.io/badge/python-3.8+-blue?logo=python&logoColor=959DA5)
[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)
[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)

<a href="https://github.com/sivakumar-mahalingam/fastmrz/" target="_blank">
    <img src="https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png" target="_blank" />
</a>

FastMRZ is an open-source Python package that extracts the Machine Readable Zone (MRZ) from passports and other documents. FastMRZ accepts various input formats such as Image, Base64 string, MRZ string, or NumPy array. 

[Features](#features) •
[Built With](#built-with) •
[Prerequisites](#prerequisites) •
[Installation](#installation) •
[Example](#example) •
[Wiki](#wiki) •
[ToDo](#todo) •
[Contributing](#contributing)

</div>

## ️✨Features

- 👁️Detects and extracts the MRZ region from document images
- ️🔍Contour detection to accurately identify the MRZ area
- 🎨Custom trained models using ONNX 
- 🆗Contains checksum logics for data validation
- 📤Outputs the extracted MRZ region as text/json


## 🛠️Built With

![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)
![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)
![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)
![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)

## 🚨Prerequisites
- Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable and ensure that tesseract can be reached from the command line. 

## ⚙️Installation

1. Install `fastmrz`
    ```bash
    pip install fastmrz
    ```
   This can be done through conda too if you prefer.

     ```bash
     conda create -n fastmrz tesseract -c conda-forge
     conda activate fastmrz
     ```

2. Copy  the `mrz.traineddata` file from the `tessdata` folder of the [repository](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) into the `tessdata` folder of the Tesseract installation on **YOUR MACHINE**

## 💡Example

```Python
from fastmrz import FastMRZ
import json

fast_mrz = FastMRZ()
# Pass file path of installed Tesseract OCR, incase if not added to PATH variable
# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac
# fast_mrz = FastMRZ(tesseract_path=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe') # Default path in Windows
passport_mrz = fast_mrz.get_details("../data/passport_uk.jpg", include_checkdigit=False)
print("JSON:")
print(json.dumps(passport_mrz, indent=4))

print("\n")

passport_mrz = fast_mrz.get_details("../data/passport_uk.jpg", ignore_parse=True)
print("TEXT:")
print(passport_mrz)
```

**OUTPUT:**
```Console
JSON:
{
    "mrz_type": "TD3",
    "document_code": "P",
    "issuer_code": "GBR",
    "surname": "PUDARSAN",
    "given_name": "HENERT",
    "document_number": "707797979",
    "document_number_checkdigit": "2",
    "nationality_code": "GBR",
    "birth_date": "1995-05-20",
    "sex": "M",
    "expiry_date": "2017-04-22",
    "optional_data": "",
    "mrz_text": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00",
    "status": "SUCCESS"
}


TEXT:
P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
```

## 📃Wiki

<details>
    <summary><b>MRZ Types & Format</b></summary>

The standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.

There are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:

- TD-1 (e.g. citizen’s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.
- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B — e.g. Schengen visa): consists of 2 lines, 36 characters each.
- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A — issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.

Now, based on the example of a national passport, let us take a closer look at the MRZ composition.

![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)

</details>

![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)

## ✅ToDo

- [x] Include mrva and mrvb documents
- [x] Add wiki page
- [x] Support numpy array as input
- [x] Support mrz text as input
- [x] Support base64 as input
- [ ] Support pdf as input
- [x] Function to return mrz text as output
- [ ] Bulk process
- [ ] Add function parameter - Image Enhancement Model
- [ ] Add function parameter - Text Image Enhancement Model
- [ ] Train Tesseract model with additional data
- [x] Add function parameter - include_checkdigit
- [ ] Add function - get_mrz_image
- [x] Add function - validate_mrz
- [ ] Add function - generate_mrz
- [ ] Extract face image
- [ ] Add documentation page

## 🤝 Contributing

Contributions are welcome! Here's how you can help:

1. Fork the repository
2. Create a new branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Commit your changes (`git commit -m 'feat: add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request

## ⚖️License

Distributed under the AGPL-3.0 License. See `LICENSE` for more information.

## 🙏Show your support

Give a ⭐️ if <a href="https://github.com/sivakumar-mahalingam/fastmrz/">this</a> project helped you!

## 🚀Who's Using It?

We’d love to know who’s using **fastmrz**! If your company or project uses this package, feel free to share your story. You can:

- Open an issue with the title "We are using fastmrz!" and include your project or company name.

Thank you for supporting **fastmrz**! 🤟



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sivakumar-mahalingam/fastmrz/",
    "name": "fastmrz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "fastmrz, mrz, image processing, image recognition, ocr, computer vision, text recognition, text detection, artificial intelligence, onnx",
    "author": "Sivakumar Mahalingam",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/64/c4/af702f59fa98da7184a73d97427e374cd2fa5897f3d5a21e19195cb3e13d/fastmrz-2.1.1.tar.gz",
    "platform": null,
    "description": "# Fast MRZ\r\n\r\n<div align=\"center\">\r\n\r\n[![License](https://img.shields.io/badge/license-AGPL%203.0-34D058?color=blue)](https://github.com/sivakumar-mahalingam/fastmrz/blob/main/LICENSE)\r\n[![Downloads](https://static.pepy.tech/badge/fastmrz)](https://pypistats.org/packages/fastmrz)\r\n![Python](https://img.shields.io/badge/python-3.8+-blue?logo=python&logoColor=959DA5)\r\n[![CodeQL](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml/badge.svg)](https://github.com/sivakumar-mahalingam/fastmrz/actions/workflows/codeql.yml)\r\n[![PyPI](https://img.shields.io/pypi/v/fastmrz.svg?logo=pypi&logoColor=959DA5&color=blue)](https://pypi.org/project/fastmrz/)\r\n\r\n<a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\" target=\"_blank\">\r\n    <img src=\"https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/FastMRZ.png\" target=\"_blank\" />\r\n</a>\r\n\r\nFastMRZ is an open-source Python package that extracts the Machine Readable Zone (MRZ) from passports and other documents. FastMRZ accepts various input formats such as Image, Base64 string, MRZ string, or NumPy array. \r\n\r\n[Features](#features) \u2022\r\n[Built With](#built-with) \u2022\r\n[Prerequisites](#prerequisites) \u2022\r\n[Installation](#installation) \u2022\r\n[Example](#example) \u2022\r\n[Wiki](#wiki) \u2022\r\n[ToDo](#todo) \u2022\r\n[Contributing](#contributing)\r\n\r\n</div>\r\n\r\n## \ufe0f\u2728Features\r\n\r\n- \ud83d\udc41\ufe0fDetects and extracts the MRZ region from document images\r\n- \ufe0f\ud83d\udd0dContour detection to accurately identify the MRZ area\r\n- \ud83c\udfa8Custom trained models using ONNX \r\n- \ud83c\udd97Contains checksum logics for data validation\r\n- \ud83d\udce4Outputs the extracted MRZ region as text/json\r\n\r\n\r\n## \ud83d\udee0\ufe0fBuilt With\r\n\r\n![OpenCV](https://img.shields.io/badge/OpenCV-27338e?style=for-the-badge&logo=OpenCV&logoColor=white)\r\n![Tesseract OCR](https://img.shields.io/badge/Tesseract%20OCR-0F9D58?style=for-the-badge&logo=google&logoColor=white)\r\n![NumPy](https://img.shields.io/badge/numpy-316192?style=for-the-badge&logo=numpy&logoColor=white)\r\n![ONNX](https://img.shields.io/badge/ONNX-7B7B7B?style=for-the-badge&logo=onnx&logoColor=white)\r\n\r\n## \ud83d\udea8Prerequisites\r\n- Install [Tesseract OCR](https://tesseract-ocr.github.io/tessdoc/Installation.html) engine. And set `PATH` variable with the executable and ensure that tesseract can be reached from the command line. \r\n\r\n## \u2699\ufe0fInstallation\r\n\r\n1. Install `fastmrz`\r\n    ```bash\r\n    pip install fastmrz\r\n    ```\r\n   This can be done through conda too if you prefer.\r\n\r\n     ```bash\r\n     conda create -n fastmrz tesseract -c conda-forge\r\n     conda activate fastmrz\r\n     ```\r\n\r\n2. Copy  the `mrz.traineddata` file from the `tessdata` folder of the [repository](https://github.com/sivakumar-mahalingam/fastmrz/raw/main/tessdata/mrz.traineddata) into the `tessdata` folder of the Tesseract installation on **YOUR MACHINE**\r\n\r\n## \ud83d\udca1Example\r\n\r\n```Python\r\nfrom fastmrz import FastMRZ\r\nimport json\r\n\r\nfast_mrz = FastMRZ()\r\n# Pass file path of installed Tesseract OCR, incase if not added to PATH variable\r\n# fast_mrz = FastMRZ(tesseract_path=r'/opt/homebrew/Cellar/tesseract/5.3.4_1/bin/tesseract') # Default path in Mac\r\n# fast_mrz = FastMRZ(tesseract_path=r'C:\\\\Program Files\\\\Tesseract-OCR\\\\tesseract.exe') # Default path in Windows\r\npassport_mrz = fast_mrz.get_details(\"../data/passport_uk.jpg\", include_checkdigit=False)\r\nprint(\"JSON:\")\r\nprint(json.dumps(passport_mrz, indent=4))\r\n\r\nprint(\"\\n\")\r\n\r\npassport_mrz = fast_mrz.get_details(\"../data/passport_uk.jpg\", ignore_parse=True)\r\nprint(\"TEXT:\")\r\nprint(passport_mrz)\r\n```\r\n\r\n**OUTPUT:**\r\n```Console\r\nJSON:\r\n{\r\n    \"mrz_type\": \"TD3\",\r\n    \"document_code\": \"P\",\r\n    \"issuer_code\": \"GBR\",\r\n    \"surname\": \"PUDARSAN\",\r\n    \"given_name\": \"HENERT\",\r\n    \"document_number\": \"707797979\",\r\n    \"document_number_checkdigit\": \"2\",\r\n    \"nationality_code\": \"GBR\",\r\n    \"birth_date\": \"1995-05-20\",\r\n    \"sex\": \"M\",\r\n    \"expiry_date\": \"2017-04-22\",\r\n    \"optional_data\": \"\",\r\n    \"mrz_text\": \"P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00\",\r\n    \"status\": \"SUCCESS\"\r\n}\r\n\r\n\r\nTEXT:\r\nP<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<\r\n7077979792GBR9505209M1704224<<<<<<<<<<<<<<00\r\n```\r\n\r\n## \ud83d\udcc3Wiki\r\n\r\n<details>\r\n    <summary><b>MRZ Types & Format</b></summary>\r\n\r\nThe standard for MRZ code is strictly regulated and has to comply with [Doc 9303](https://www.icao.int/publications/pages/publication.aspx?docnum=9303). Machine Readable Travel Documents published by the International Civil Aviation Organization.\r\n\r\nThere are currently several types of ICAO standard machine-readable zones, which vary in the number of lines and characters in each line:\r\n\r\n- TD-1 (e.g. citizen\u2019s identification card, EU ID card, US Green Card): consists of 3 lines, 30 characters each.\r\n- TD-2 (e.g. Romania ID, old type of German ID), and MRV-B (machine-readable visas type B \u2014 e.g. Schengen visa): consists of 2 lines, 36 characters each.\r\n- TD-3 (all international passports, also known as MRP), and MRV-A (machine-readable visas type A \u2014 issued by the USA, Japan, China, and others): consist of 2 lines, 44 characters each.\r\n\r\nNow, based on the example of a national passport, let us take a closer look at the MRZ composition.\r\n\r\n![MRZ fields distribution](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz_fields_distribution.png)\r\n\r\n</details>\r\n\r\n![MRZ GIF](https://raw.githubusercontent.com/sivakumar-mahalingam/fastmrz/main/docs/mrz.gif)\r\n\r\n## \u2705ToDo\r\n\r\n- [x] Include mrva and mrvb documents\r\n- [x] Add wiki page\r\n- [x] Support numpy array as input\r\n- [x] Support mrz text as input\r\n- [x] Support base64 as input\r\n- [ ] Support pdf as input\r\n- [x] Function to return mrz text as output\r\n- [ ] Bulk process\r\n- [ ] Add function parameter - Image Enhancement Model\r\n- [ ] Add function parameter - Text Image Enhancement Model\r\n- [ ] Train Tesseract model with additional data\r\n- [x] Add function parameter - include_checkdigit\r\n- [ ] Add function - get_mrz_image\r\n- [x] Add function - validate_mrz\r\n- [ ] Add function - generate_mrz\r\n- [ ] Extract face image\r\n- [ ] Add documentation page\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\nContributions are welcome! Here's how you can help:\r\n\r\n1. Fork the repository\r\n2. Create a new branch (`git checkout -b feature/amazing-feature`)\r\n3. Make your changes\r\n4. Commit your changes (`git commit -m 'feat: add amazing feature'`)\r\n5. Push to the branch (`git push origin feature/amazing-feature`)\r\n6. Open a Pull Request\r\n\r\n## \u2696\ufe0fLicense\r\n\r\nDistributed under the AGPL-3.0 License. See `LICENSE` for more information.\r\n\r\n## \ud83d\ude4fShow your support\r\n\r\nGive a \u2b50\ufe0f if <a href=\"https://github.com/sivakumar-mahalingam/fastmrz/\">this</a> project helped you!\r\n\r\n## \ud83d\ude80Who's Using It?\r\n\r\nWe\u2019d love to know who\u2019s using **fastmrz**! If your company or project uses this package, feel free to share your story. You can:\r\n\r\n- Open an issue with the title \"We are using fastmrz!\" and include your project or company name.\r\n\r\nThank you for supporting **fastmrz**! \ud83e\udd1f\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "AGPLv3",
    "summary": "Extracts the Machine Readable Zone (MRZ) data from document images",
    "version": "2.1.1",
    "project_urls": {
        "Homepage": "https://github.com/sivakumar-mahalingam/fastmrz/",
        "Source": "https://github.com/sivakumar-mahalingam/fastmrz",
        "Tracker": "https://github.com/sivakumar-mahalingam/fastmrz/issues"
    },
    "split_keywords": [
        "fastmrz",
        " mrz",
        " image processing",
        " image recognition",
        " ocr",
        " computer vision",
        " text recognition",
        " text detection",
        " artificial intelligence",
        " onnx"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "925cda06866d7cbcc71b27f80f56022d9367d1b348b4b85e74bd8d96e1a2a2a2",
                "md5": "7430524442c66ecfe169c7ab1e5aab20",
                "sha256": "71a31046f2dd347760bdd1110adfc01cd508421c253440a3ca28f9b9904a17ba"
            },
            "downloads": -1,
            "filename": "fastmrz-2.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7430524442c66ecfe169c7ab1e5aab20",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9773403,
            "upload_time": "2025-02-19T02:49:20",
            "upload_time_iso_8601": "2025-02-19T02:49:20.796348Z",
            "url": "https://files.pythonhosted.org/packages/92/5c/da06866d7cbcc71b27f80f56022d9367d1b348b4b85e74bd8d96e1a2a2a2/fastmrz-2.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64c4af702f59fa98da7184a73d97427e374cd2fa5897f3d5a21e19195cb3e13d",
                "md5": "f1829e68f5120661aa62612f3a184edd",
                "sha256": "674647832b86bf59f2a66f7d590d0e6833159de2adb9f7ce09a4b5f1f3e17ff6"
            },
            "downloads": -1,
            "filename": "fastmrz-2.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f1829e68f5120661aa62612f3a184edd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 4900225,
            "upload_time": "2025-02-19T02:49:24",
            "upload_time_iso_8601": "2025-02-19T02:49:24.220649Z",
            "url": "https://files.pythonhosted.org/packages/64/c4/af702f59fa98da7184a73d97427e374cd2fa5897f3d5a21e19195cb3e13d/fastmrz-2.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-19 02:49:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sivakumar-mahalingam",
    "github_project": "fastmrz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "fastmrz"
}
        
Elapsed time: 0.42836s