passport_mrz_extractor
======================
`passport_mrz_extractor` is a Python library for extracting and validating Machine Readable Zone (MRZ) data from passport images.
It uses Tesseract OCR to read MRZ text and validates it using the `mrz` library.
Features
--------
- Extract MRZ data from passport images.
- Validate MRZ data fields, including document type, name, nationality, date of birth, and expiry date.
- Automatic image processing for better OCR accuracy.
Installation
------------
You can install `passport_mrz_extractor` using `pip`:
.. code-block:: bash
pip install passport_mrz_extractor
Requirements
------------
- **Python** >= 3.10
- **Tesseract OCR** installed on your system
To install Tesseract:
- **Ubuntu**: `sudo apt install tesseract-ocr`
- **MacOS (using Homebrew)**: `brew install tesseract`
- **Windows**: Download the installer from https://github.com/UB-Mannheim/tesseract/wiki
Dependencies
------------
This library requires the following Python packages:
- `pytesseract` - For performing OCR on images.
- `opencv-python` - For image processing.
- `mrz` - For MRZ data validation.
- `Pillow` - For handling image files in Python.
Usage
-----
Here’s how to use `passport_mrz_extractor` to extract MRZ data from a passport image.
### Basic Example
This example demonstrates extracting all available MRZ fields from an image and handling potential errors.
.. code-block:: python
from passport_mrz_extractor import read_mrz
# Path to the passport image
image_path = 'path/to/passport_image.jpg'
try:
mrz_data = read_mrz(image_path)
print("Extracted MRZ Data:")
for key, value in mrz_data.items():
print(f"{key}: {value}")
except ValueError as e:
print(f"Error reading MRZ: {e}")
### Example of Using Specific MRZ Fields
In this example, we extract specific fields such as the country, document number, and birth date, and print them in a formatted output.
.. code-block:: python
from passport_mrz_extractor import read_mrz
# Path to the passport image
image_path = 'path/to/passport_image.jpg'
try:
# Extract MRZ data
mrz_data = mrz_reader.read_mrz(image_path)
# Display specific fields
print("Country of Issue:", mrz_data.get("country"))
print("Document Number:", mrz_data.get("document_number"))
print("Name:", mrz_data.get("name"))
print("Surname:", mrz_data.get("surname"))
print("Date of Birth:", mrz_data.get("birth_date"))
print("Expiry Date:", mrz_data.get("expiry_date"))
print("Nationality:", mrz_data.get("nationality"))
print("Sex:", mrz_data.get("sex"))
except ValueError as e:
print(f"Error reading MRZ: {e}")
Contributing
------------
If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are welcome.
Issues
------
If you encounter any issues, please report them on the GitHub repository:
https://github.com/Azim-Kenzh/passport_mrz_extractor/issues
License
-------
`passport_mrz_extractor` is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": "https://github.com/Azim-Kenzh/passport_mrz_extractor",
"name": "passport-mrz-extractor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "MRZ passport OCR Tesseract image-processing",
"author": "Azimkozho Kenzhebek uulu",
"author_email": "azimkozho.inventor@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/28/79/54ea90e1b001576c9300e1cf24885215f26646a18f017508b93936a42089/passport_mrz_extractor-1.0.13.tar.gz",
"platform": null,
"description": "passport_mrz_extractor\n======================\n\n`passport_mrz_extractor` is a Python library for extracting and validating Machine Readable Zone (MRZ) data from passport images.\nIt uses Tesseract OCR to read MRZ text and validates it using the `mrz` library.\n\nFeatures\n--------\n\n- Extract MRZ data from passport images.\n- Validate MRZ data fields, including document type, name, nationality, date of birth, and expiry date.\n- Automatic image processing for better OCR accuracy.\n\nInstallation\n------------\n\nYou can install `passport_mrz_extractor` using `pip`:\n\n.. code-block:: bash\n\n pip install passport_mrz_extractor\n\nRequirements\n------------\n\n- **Python** >= 3.10\n- **Tesseract OCR** installed on your system\n\nTo install Tesseract:\n\n- **Ubuntu**: `sudo apt install tesseract-ocr`\n- **MacOS (using Homebrew)**: `brew install tesseract`\n- **Windows**: Download the installer from https://github.com/UB-Mannheim/tesseract/wiki\n\nDependencies\n------------\n\nThis library requires the following Python packages:\n\n- `pytesseract` - For performing OCR on images.\n- `opencv-python` - For image processing.\n- `mrz` - For MRZ data validation.\n- `Pillow` - For handling image files in Python.\n\nUsage\n-----\n\nHere\u2019s how to use `passport_mrz_extractor` to extract MRZ data from a passport image.\n\n### Basic Example\n\nThis example demonstrates extracting all available MRZ fields from an image and handling potential errors.\n\n.. code-block:: python\n\n from passport_mrz_extractor import read_mrz\n\n # Path to the passport image\n image_path = 'path/to/passport_image.jpg'\n\n try:\n mrz_data = read_mrz(image_path)\n print(\"Extracted MRZ Data:\")\n for key, value in mrz_data.items():\n print(f\"{key}: {value}\")\n except ValueError as e:\n print(f\"Error reading MRZ: {e}\")\n\n### Example of Using Specific MRZ Fields\n\nIn this example, we extract specific fields such as the country, document number, and birth date, and print them in a formatted output.\n\n.. code-block:: python\n\n from passport_mrz_extractor import read_mrz\n\n # Path to the passport image\n image_path = 'path/to/passport_image.jpg'\n\n try:\n # Extract MRZ data\n mrz_data = mrz_reader.read_mrz(image_path)\n\n # Display specific fields\n print(\"Country of Issue:\", mrz_data.get(\"country\"))\n print(\"Document Number:\", mrz_data.get(\"document_number\"))\n print(\"Name:\", mrz_data.get(\"name\"))\n print(\"Surname:\", mrz_data.get(\"surname\"))\n print(\"Date of Birth:\", mrz_data.get(\"birth_date\"))\n print(\"Expiry Date:\", mrz_data.get(\"expiry_date\"))\n print(\"Nationality:\", mrz_data.get(\"nationality\"))\n print(\"Sex:\", mrz_data.get(\"sex\"))\n\n except ValueError as e:\n print(f\"Error reading MRZ: {e}\")\n\nContributing\n------------\n\nIf you'd like to contribute, please fork the repository and use a feature branch. Pull requests are welcome.\n\nIssues\n------\n\nIf you encounter any issues, please report them on the GitHub repository:\n\nhttps://github.com/Azim-Kenzh/passport_mrz_extractor/issues\n\nLicense\n-------\n\n`passport_mrz_extractor` is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for reading MRZ data from passport images using Tesseract OCR",
"version": "1.0.13",
"project_urls": {
"Homepage": "https://github.com/Azim-Kenzh/passport_mrz_extractor"
},
"split_keywords": [
"mrz",
"passport",
"ocr",
"tesseract",
"image-processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "287954ea90e1b001576c9300e1cf24885215f26646a18f017508b93936a42089",
"md5": "36add40eb88164792ebfa8d0f19d06f4",
"sha256": "10ab904e47b6b17d5462984d6168d0ab664cbda8d06c95310c1de929c0ee8d93"
},
"downloads": -1,
"filename": "passport_mrz_extractor-1.0.13.tar.gz",
"has_sig": false,
"md5_digest": "36add40eb88164792ebfa8d0f19d06f4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 3949,
"upload_time": "2024-12-03T12:44:50",
"upload_time_iso_8601": "2024-12-03T12:44:50.271731Z",
"url": "https://files.pythonhosted.org/packages/28/79/54ea90e1b001576c9300e1cf24885215f26646a18f017508b93936a42089/passport_mrz_extractor-1.0.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-03 12:44:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Azim-Kenzh",
"github_project": "passport_mrz_extractor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "Pillow",
"specs": [
[
"==",
"11.0.0"
]
]
},
{
"name": "pytesseract",
"specs": [
[
"==",
"0.3.13"
]
]
},
{
"name": "mrz",
"specs": [
[
"==",
"0.6.2"
]
]
}
],
"lcname": "passport-mrz-extractor"
}