# Statement Verification
A Python package for verifying PDF statements from financial institutions. Extracts metadata, detects the issuing institution, and provides verification scores.
## Installation
```bash
# From PyPI (recommended)
pip install sverification
# Or from source
git clone https://github.com/Tausi-Africa/statement-verification.git
cd statement-verification
pip install -e .
```
## Quick Start
```bash
# Command line usage
verify-statement path/to/statement.pdf --brands statements_metadata.json
```
```python
# Python API - Simple verification
import sverification
result = sverification.verify_statement_verbose("statement.pdf")
print(f"Brand: {result['detected_brand']}, Score: {result['verification_score']:.1f}%")
```
## ๐ Function Reference
### 1. `verify_statement_verbose()` - Complete Verification
**Purpose**: Performs complete statement verification with detailed results.
```python
import sverification
# Basic usage
result = sverification.verify_statement_verbose("statement.pdf")
# With custom brands file
result = sverification.verify_statement_verbose(
pdf_path="statement.pdf",
brands_json_path="custom_brands.json"
)
# Access results
print(f"Detected Brand: {result['detected_brand']}")
print(f"Verification Score: {result['verification_score']:.1f}%")
print(f"Total Fields Checked: {result['total_fields']}")
print(f"Fields Matched: {result['matched_fields']}")
# Get detailed field results
for field in result['field_results']:
status = "โ" if field['match'] else "โ"
print(f"[{status}] {field['field']}: {field['actual']} (expected: {field['expected']})")
```
**Returns**: Dictionary with complete verification data
- `detected_brand`: Institution name
- `verification_score`: Percentage score (0-100)
- `field_results`: List of field comparisons
- `total_fields`: Number of fields checked
- `matched_fields`: Number of matching fields
- `summary`: Human-readable summary
### 2. `print_verification_report()` - Formatted Output
**Purpose**: Prints a formatted verification report.
```python
import sverification
# Get verification results
result = sverification.verify_statement_verbose("statement.pdf")
# Print formatted report (same as CLI output)
sverification.print_verification_report(result)
# Output example:
# ========================================================================
# PDF: statement.pdf
# Detected brand: selcom
# Template used: selcom
# Fields checked: 5
# Fields matched: 5
# Verification score: 100.0%
# ------------------------------------------------------------------------
# Field Comparison (expected vs. actual):
# [โ] pdf_version expected='1.4' actual='1.4'
# [โ] creator expected='Selcom' actual='Selcom'
# ========================================================================
```
### 3. `extract_all()` - PDF Metadata Extraction
**Purpose**: Extracts comprehensive metadata from PDF files.
```python
import sverification
# Extract metadata
metadata = sverification.extract_all("statement.pdf")
# Access specific metadata
print(f"PDF Version: {metadata['pdf_version']}")
print(f"Creator: {metadata['creator']}")
print(f"Producer: {metadata['producer']}")
print(f"Creation Date: {metadata['creationdate']}")
print(f"Modification Date: {metadata['moddate']}")
print(f"EOF Markers: {metadata['eof_markers']}")
print(f"PDF Versions: {metadata['pdf_versions']}")
# Check for potential issues
if metadata['eof_markers'] > 1:
print("โ ๏ธ Multiple EOF markers detected")
if metadata['creationdate'] != metadata['moddate']:
print("โ ๏ธ Creation and modification dates differ")
```
**Returns**: Dictionary with extracted metadata
- `pdf_version`: PDF specification version
- `creator`: Application that created the PDF
- `producer`: Software that produced the PDF
- `creationdate`: When PDF was created
- `moddate`: When PDF was last modified
- `eof_markers`: Number of EOF markers (security indicator)
- `pdf_versions`: Number of PDF versions
### 4. `get_company_name()` - Institution Detection
**Purpose**: Automatically detects the financial institution from PDF content.
```python
import sverification
# Detect institution
company = sverification.get_company_name("statement.pdf")
print(f"Detected Institution: {company}")
# Handle unknown institutions
if company == "unknown":
print("โ ๏ธ Institution not recognized")
print("Consider adding detection rules for this institution")
# Examples of detected institutions:
# "selcom", "vodacom", "airtel", "absa", "crdb", "nmb", etc.
```
**Returns**: String with institution code
- Returns standardized institution codes (e.g., "selcom", "vodacom")
- Returns "unknown" if institution cannot be detected
### 5. `load_brands()` - Template Management
**Purpose**: Loads institution templates for comparison.
```python
import sverification
# Load default templates
brands = sverification.load_brands("statements_metadata.json")
# Check available institutions
print("Available institutions:")
for brand_code, templates in brands.items():
print(f" - {brand_code}: {len(templates)} template(s)")
# Get template for specific institution
selcom_templates = brands.get("selcom", [])
if selcom_templates:
template = selcom_templates[0] # Use first template
print(f"Expected PDF version for Selcom: {template.get('pdf_version')}")
print(f"Expected creator: {template.get('creator')}")
```
**Returns**: Dictionary mapping institution codes to template lists
### 6. `compare_fields()` - Field-by-Field Comparison
**Purpose**: Compares extracted metadata against expected template.
```python
import sverification
# Extract metadata and load templates
metadata = sverification.extract_all("statement.pdf")
brands = sverification.load_brands("statements_metadata.json")
company = sverification.get_company_name("statement.pdf")
# Get expected template
expected = brands.get(company.lower(), [{}])[0]
# Compare fields
results, score = sverification.compare_fields(metadata, expected)
print(f"Overall Score: {score:.1f}%")
print("\nField-by-field results:")
for field_name, expected_val, actual_val, is_match in results:
status = "โ PASS" if is_match else "โ FAIL"
print(f"{status} {field_name}")
print(f" Expected: {expected_val}")
print(f" Actual: {actual_val}")
print()
```
**Returns**: Tuple of (results_list, percentage_score)
- `results_list`: List of tuples (field, expected, actual, match_bool)
- `percentage_score`: Float between 0-100
## ๐ Common Workflows
### Batch Processing Multiple Files
```python
import sverification
import os
def process_directory(pdf_directory):
"""Process all PDFs in a directory"""
results = []
for filename in os.listdir(pdf_directory):
if filename.endswith('.pdf'):
pdf_path = os.path.join(pdf_directory, filename)
try:
result = sverification.verify_statement_verbose(pdf_path)
results.append({
'file': filename,
'brand': result['detected_brand'],
'score': result['verification_score']
})
print(f"โ {filename}: {result['verification_score']:.1f}%")
except Exception as e:
print(f"โ {filename}: Error - {e}")
return results
# Process all PDFs in a folder
results = process_directory("./statements/")
```
### Custom Analysis
```python
import sverification
def analyze_statement_quality(pdf_path):
"""Analyze statement quality indicators"""
metadata = sverification.extract_all(pdf_path)
company = sverification.get_company_name(pdf_path)
issues = []
# Check for multiple EOF markers (potential tampering)
if metadata['eof_markers'] > 1:
issues.append("Multiple EOF markers detected")
# Check for date inconsistencies
if metadata['creationdate'] != metadata['moddate']:
issues.append("Creation and modification dates differ")
# Check for unknown institution
if company == "unknown":
issues.append("Institution not recognized")
return {
'company': company,
'issues': issues,
'metadata': metadata
}
# Analyze a statement
analysis = analyze_statement_quality("statement.pdf")
print(f"Institution: {analysis['company']}")
if analysis['issues']:
print("โ ๏ธ Issues found:")
for issue in analysis['issues']:
print(f" - {issue}")
else:
print("โ No issues detected")
```
## ๐ฆ Supported Institutions
Banks: ABSA, CRDB, DTB, Exim, NMB, NBC, TCB, UBA
Mobile Money: Airtel, Tigo, Vodacom, Halotel, Selcom
Others: Azam Pesa, PayMaart, and more...
## ๐งช Testing
```bash
# Run tests
pytest
# Run with coverage
pytest --cov=sverification
```
## ๐ License
Proprietary software licensed under Black Swan AI Global. See [LICENSE](LICENSE) for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/Tausi-Africa/statement-verification",
"name": "sverification",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pdf verification statements metadata financial",
"author": "Alex Mkwizu @ Black Swan AI",
"author_email": "alex@bsa.ai",
"download_url": "https://files.pythonhosted.org/packages/5d/cd/1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385/sverification-0.1.1.tar.gz",
"platform": null,
"description": "# Statement Verification\n\nA Python package for verifying PDF statements from financial institutions. Extracts metadata, detects the issuing institution, and provides verification scores.\n\n## Installation\n\n```bash\n# From PyPI (recommended)\npip install sverification\n\n# Or from source\ngit clone https://github.com/Tausi-Africa/statement-verification.git\ncd statement-verification\npip install -e .\n```\n\n## Quick Start\n\n```bash\n# Command line usage\nverify-statement path/to/statement.pdf --brands statements_metadata.json\n```\n\n```python\n# Python API - Simple verification\nimport sverification\n\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\nprint(f\"Brand: {result['detected_brand']}, Score: {result['verification_score']:.1f}%\")\n```\n\n## \ud83d\udcda Function Reference\n\n### 1. `verify_statement_verbose()` - Complete Verification\n\n**Purpose**: Performs complete statement verification with detailed results.\n\n```python\nimport sverification\n\n# Basic usage\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\n\n# With custom brands file\nresult = sverification.verify_statement_verbose(\n pdf_path=\"statement.pdf\",\n brands_json_path=\"custom_brands.json\"\n)\n\n# Access results\nprint(f\"Detected Brand: {result['detected_brand']}\")\nprint(f\"Verification Score: {result['verification_score']:.1f}%\")\nprint(f\"Total Fields Checked: {result['total_fields']}\")\nprint(f\"Fields Matched: {result['matched_fields']}\")\n\n# Get detailed field results\nfor field in result['field_results']:\n status = \"\u2713\" if field['match'] else \"\u2717\"\n print(f\"[{status}] {field['field']}: {field['actual']} (expected: {field['expected']})\")\n```\n\n**Returns**: Dictionary with complete verification data\n- `detected_brand`: Institution name\n- `verification_score`: Percentage score (0-100)\n- `field_results`: List of field comparisons\n- `total_fields`: Number of fields checked\n- `matched_fields`: Number of matching fields\n- `summary`: Human-readable summary\n\n### 2. `print_verification_report()` - Formatted Output\n\n**Purpose**: Prints a formatted verification report.\n\n```python\nimport sverification\n\n# Get verification results\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\n\n# Print formatted report (same as CLI output)\nsverification.print_verification_report(result)\n\n# Output example:\n# ========================================================================\n# PDF: statement.pdf\n# Detected brand: selcom\n# Template used: selcom\n# Fields checked: 5\n# Fields matched: 5\n# Verification score: 100.0%\n# ------------------------------------------------------------------------\n# Field Comparison (expected vs. actual):\n# [\u2713] pdf_version expected='1.4' actual='1.4'\n# [\u2713] creator expected='Selcom' actual='Selcom'\n# ========================================================================\n```\n\n### 3. `extract_all()` - PDF Metadata Extraction\n\n**Purpose**: Extracts comprehensive metadata from PDF files.\n\n```python\nimport sverification\n\n# Extract metadata\nmetadata = sverification.extract_all(\"statement.pdf\")\n\n# Access specific metadata\nprint(f\"PDF Version: {metadata['pdf_version']}\")\nprint(f\"Creator: {metadata['creator']}\")\nprint(f\"Producer: {metadata['producer']}\")\nprint(f\"Creation Date: {metadata['creationdate']}\")\nprint(f\"Modification Date: {metadata['moddate']}\")\nprint(f\"EOF Markers: {metadata['eof_markers']}\")\nprint(f\"PDF Versions: {metadata['pdf_versions']}\")\n\n# Check for potential issues\nif metadata['eof_markers'] > 1:\n print(\"\u26a0\ufe0f Multiple EOF markers detected\")\n\nif metadata['creationdate'] != metadata['moddate']:\n print(\"\u26a0\ufe0f Creation and modification dates differ\")\n```\n\n**Returns**: Dictionary with extracted metadata\n- `pdf_version`: PDF specification version\n- `creator`: Application that created the PDF\n- `producer`: Software that produced the PDF\n- `creationdate`: When PDF was created\n- `moddate`: When PDF was last modified\n- `eof_markers`: Number of EOF markers (security indicator)\n- `pdf_versions`: Number of PDF versions\n\n### 4. `get_company_name()` - Institution Detection\n\n**Purpose**: Automatically detects the financial institution from PDF content.\n\n```python\nimport sverification\n\n# Detect institution\ncompany = sverification.get_company_name(\"statement.pdf\")\nprint(f\"Detected Institution: {company}\")\n\n# Handle unknown institutions\nif company == \"unknown\":\n print(\"\u26a0\ufe0f Institution not recognized\")\n print(\"Consider adding detection rules for this institution\")\n\n# Examples of detected institutions:\n# \"selcom\", \"vodacom\", \"airtel\", \"absa\", \"crdb\", \"nmb\", etc.\n```\n\n**Returns**: String with institution code\n- Returns standardized institution codes (e.g., \"selcom\", \"vodacom\")\n- Returns \"unknown\" if institution cannot be detected\n\n### 5. `load_brands()` - Template Management\n\n**Purpose**: Loads institution templates for comparison.\n\n```python\nimport sverification\n\n# Load default templates\nbrands = sverification.load_brands(\"statements_metadata.json\")\n\n# Check available institutions\nprint(\"Available institutions:\")\nfor brand_code, templates in brands.items():\n print(f\" - {brand_code}: {len(templates)} template(s)\")\n\n# Get template for specific institution\nselcom_templates = brands.get(\"selcom\", [])\nif selcom_templates:\n template = selcom_templates[0] # Use first template\n print(f\"Expected PDF version for Selcom: {template.get('pdf_version')}\")\n print(f\"Expected creator: {template.get('creator')}\")\n```\n\n**Returns**: Dictionary mapping institution codes to template lists\n\n### 6. `compare_fields()` - Field-by-Field Comparison\n\n**Purpose**: Compares extracted metadata against expected template.\n\n```python\nimport sverification\n\n# Extract metadata and load templates\nmetadata = sverification.extract_all(\"statement.pdf\")\nbrands = sverification.load_brands(\"statements_metadata.json\")\ncompany = sverification.get_company_name(\"statement.pdf\")\n\n# Get expected template\nexpected = brands.get(company.lower(), [{}])[0]\n\n# Compare fields\nresults, score = sverification.compare_fields(metadata, expected)\n\nprint(f\"Overall Score: {score:.1f}%\")\nprint(\"\\nField-by-field results:\")\n\nfor field_name, expected_val, actual_val, is_match in results:\n status = \"\u2713 PASS\" if is_match else \"\u2717 FAIL\"\n print(f\"{status} {field_name}\")\n print(f\" Expected: {expected_val}\")\n print(f\" Actual: {actual_val}\")\n print()\n```\n\n**Returns**: Tuple of (results_list, percentage_score)\n- `results_list`: List of tuples (field, expected, actual, match_bool)\n- `percentage_score`: Float between 0-100\n\n## \ud83d\udd04 Common Workflows\n\n### Batch Processing Multiple Files\n\n```python\nimport sverification\nimport os\n\ndef process_directory(pdf_directory):\n \"\"\"Process all PDFs in a directory\"\"\"\n results = []\n \n for filename in os.listdir(pdf_directory):\n if filename.endswith('.pdf'):\n pdf_path = os.path.join(pdf_directory, filename)\n \n try:\n result = sverification.verify_statement_verbose(pdf_path)\n results.append({\n 'file': filename,\n 'brand': result['detected_brand'],\n 'score': result['verification_score']\n })\n print(f\"\u2713 {filename}: {result['verification_score']:.1f}%\")\n except Exception as e:\n print(f\"\u2717 {filename}: Error - {e}\")\n \n return results\n\n# Process all PDFs in a folder\nresults = process_directory(\"./statements/\")\n```\n\n### Custom Analysis\n\n```python\nimport sverification\n\ndef analyze_statement_quality(pdf_path):\n \"\"\"Analyze statement quality indicators\"\"\"\n metadata = sverification.extract_all(pdf_path)\n company = sverification.get_company_name(pdf_path)\n \n issues = []\n \n # Check for multiple EOF markers (potential tampering)\n if metadata['eof_markers'] > 1:\n issues.append(\"Multiple EOF markers detected\")\n \n # Check for date inconsistencies\n if metadata['creationdate'] != metadata['moddate']:\n issues.append(\"Creation and modification dates differ\")\n \n # Check for unknown institution\n if company == \"unknown\":\n issues.append(\"Institution not recognized\")\n \n return {\n 'company': company,\n 'issues': issues,\n 'metadata': metadata\n }\n\n# Analyze a statement\nanalysis = analyze_statement_quality(\"statement.pdf\")\nprint(f\"Institution: {analysis['company']}\")\nif analysis['issues']:\n print(\"\u26a0\ufe0f Issues found:\")\n for issue in analysis['issues']:\n print(f\" - {issue}\")\nelse:\n print(\"\u2713 No issues detected\")\n```\n\n## \ud83c\udfe6 Supported Institutions\n\nBanks: ABSA, CRDB, DTB, Exim, NMB, NBC, TCB, UBA \nMobile Money: Airtel, Tigo, Vodacom, Halotel, Selcom \nOthers: Azam Pesa, PayMaart, and more...\n\n## \ud83e\uddea Testing\n\n```bash\n# Run tests\npytest\n\n# Run with coverage\npytest --cov=sverification\n```\n\n## \ud83d\udcc4 License\n\nProprietary software licensed under Black Swan AI Global. See [LICENSE](LICENSE) for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A tool for verifying PDF statements from Tanzanian and beyond institutions",
"version": "0.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/Tausi-Africa/statement-verification/issues",
"Homepage": "https://github.com/Tausi-Africa/statement-verification",
"Repository": "https://github.com/Tausi-Africa/statement-verification"
},
"split_keywords": [
"pdf",
"verification",
"statements",
"metadata",
"financial"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "de0aef1d0d7c31aa80a06716d550e83b55e0c43a05e94bc871bec10b98fb690c",
"md5": "59db002ea99f300bf1b344657f48bb7b",
"sha256": "d2706c7d338d2091efeca829273464fe4878de4c13ca72cdea43f845de50d7f1"
},
"downloads": -1,
"filename": "sverification-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "59db002ea99f300bf1b344657f48bb7b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 20577,
"upload_time": "2025-09-02T18:23:52",
"upload_time_iso_8601": "2025-09-02T18:23:52.165884Z",
"url": "https://files.pythonhosted.org/packages/de/0a/ef1d0d7c31aa80a06716d550e83b55e0c43a05e94bc871bec10b98fb690c/sverification-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5dcd1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385",
"md5": "5a5564b9b4f46c9136a03cb5883b73c7",
"sha256": "1c5127c73d0c1ba4cfa7277f37e067a729817acda832e7356e2c6d796a956d4d"
},
"downloads": -1,
"filename": "sverification-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "5a5564b9b4f46c9136a03cb5883b73c7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20560,
"upload_time": "2025-09-02T18:23:53",
"upload_time_iso_8601": "2025-09-02T18:23:53.579032Z",
"url": "https://files.pythonhosted.org/packages/5d/cd/1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385/sverification-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 18:23:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Tausi-Africa",
"github_project": "statement-verification",
"github_not_found": true,
"lcname": "sverification"
}