sverification

Name	sverification JSON
Version	0.1.1 JSON
	download
home_page	https://github.com/Tausi-Africa/statement-verification
Summary	A tool for verifying PDF statements from Tanzanian and beyond institutions
upload_time	2025-09-02 18:23:53
maintainer	None
docs_url	None
author	Alex Mkwizu @ Black Swan AI
requires_python	>=3.8
license	None
keywords	pdf verification statements metadata financial
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Statement Verification

A Python package for verifying PDF statements from financial institutions. Extracts metadata, detects the issuing institution, and provides verification scores.

## Installation

```bash
# From PyPI (recommended)
pip install sverification

# Or from source
git clone https://github.com/Tausi-Africa/statement-verification.git
cd statement-verification
pip install -e .
```

## Quick Start

```bash
# Command line usage
verify-statement path/to/statement.pdf --brands statements_metadata.json
```

```python
# Python API - Simple verification
import sverification

result = sverification.verify_statement_verbose("statement.pdf")
print(f"Brand: {result['detected_brand']}, Score: {result['verification_score']:.1f}%")
```

## 📚 Function Reference

### 1. `verify_statement_verbose()` - Complete Verification

**Purpose**: Performs complete statement verification with detailed results.

```python
import sverification

# Basic usage
result = sverification.verify_statement_verbose("statement.pdf")

# With custom brands file
result = sverification.verify_statement_verbose(
    pdf_path="statement.pdf",
    brands_json_path="custom_brands.json"
)

# Access results
print(f"Detected Brand: {result['detected_brand']}")
print(f"Verification Score: {result['verification_score']:.1f}%")
print(f"Total Fields Checked: {result['total_fields']}")
print(f"Fields Matched: {result['matched_fields']}")

# Get detailed field results
for field in result['field_results']:
    status = "✓" if field['match'] else "✗"
    print(f"[{status}] {field['field']}: {field['actual']} (expected: {field['expected']})")
```

**Returns**: Dictionary with complete verification data
- `detected_brand`: Institution name
- `verification_score`: Percentage score (0-100)
- `field_results`: List of field comparisons
- `total_fields`: Number of fields checked
- `matched_fields`: Number of matching fields
- `summary`: Human-readable summary

### 2. `print_verification_report()` - Formatted Output

**Purpose**: Prints a formatted verification report.

```python
import sverification

# Get verification results
result = sverification.verify_statement_verbose("statement.pdf")

# Print formatted report (same as CLI output)
sverification.print_verification_report(result)

# Output example:
# ========================================================================
# PDF: statement.pdf
# Detected brand: selcom
# Template used: selcom
# Fields checked: 5
# Fields matched: 5
# Verification score: 100.0%
# ------------------------------------------------------------------------
# Field Comparison (expected vs. actual):
#   [✓] pdf_version      expected='1.4'  actual='1.4'
#   [✓] creator          expected='Selcom'  actual='Selcom'
# ========================================================================
```

### 3. `extract_all()` - PDF Metadata Extraction

**Purpose**: Extracts comprehensive metadata from PDF files.

```python
import sverification

# Extract metadata
metadata = sverification.extract_all("statement.pdf")

# Access specific metadata
print(f"PDF Version: {metadata['pdf_version']}")
print(f"Creator: {metadata['creator']}")
print(f"Producer: {metadata['producer']}")
print(f"Creation Date: {metadata['creationdate']}")
print(f"Modification Date: {metadata['moddate']}")
print(f"EOF Markers: {metadata['eof_markers']}")
print(f"PDF Versions: {metadata['pdf_versions']}")

# Check for potential issues
if metadata['eof_markers'] > 1:
    print("⚠️  Multiple EOF markers detected")

if metadata['creationdate'] != metadata['moddate']:
    print("⚠️  Creation and modification dates differ")
```

**Returns**: Dictionary with extracted metadata
- `pdf_version`: PDF specification version
- `creator`: Application that created the PDF
- `producer`: Software that produced the PDF
- `creationdate`: When PDF was created
- `moddate`: When PDF was last modified
- `eof_markers`: Number of EOF markers (security indicator)
- `pdf_versions`: Number of PDF versions

### 4. `get_company_name()` - Institution Detection

**Purpose**: Automatically detects the financial institution from PDF content.

```python
import sverification

# Detect institution
company = sverification.get_company_name("statement.pdf")
print(f"Detected Institution: {company}")

# Handle unknown institutions
if company == "unknown":
    print("⚠️  Institution not recognized")
    print("Consider adding detection rules for this institution")

# Examples of detected institutions:
# "selcom", "vodacom", "airtel", "absa", "crdb", "nmb", etc.
```

**Returns**: String with institution code
- Returns standardized institution codes (e.g., "selcom", "vodacom")
- Returns "unknown" if institution cannot be detected

### 5. `load_brands()` - Template Management

**Purpose**: Loads institution templates for comparison.

```python
import sverification

# Load default templates
brands = sverification.load_brands("statements_metadata.json")

# Check available institutions
print("Available institutions:")
for brand_code, templates in brands.items():
    print(f"  - {brand_code}: {len(templates)} template(s)")

# Get template for specific institution
selcom_templates = brands.get("selcom", [])
if selcom_templates:
    template = selcom_templates[0]  # Use first template
    print(f"Expected PDF version for Selcom: {template.get('pdf_version')}")
    print(f"Expected creator: {template.get('creator')}")
```

**Returns**: Dictionary mapping institution codes to template lists

### 6. `compare_fields()` - Field-by-Field Comparison

**Purpose**: Compares extracted metadata against expected template.

```python
import sverification

# Extract metadata and load templates
metadata = sverification.extract_all("statement.pdf")
brands = sverification.load_brands("statements_metadata.json")
company = sverification.get_company_name("statement.pdf")

# Get expected template
expected = brands.get(company.lower(), [{}])[0]

# Compare fields
results, score = sverification.compare_fields(metadata, expected)

print(f"Overall Score: {score:.1f}%")
print("\nField-by-field results:")

for field_name, expected_val, actual_val, is_match in results:
    status = "✓ PASS" if is_match else "✗ FAIL"
    print(f"{status} {field_name}")
    print(f"  Expected: {expected_val}")
    print(f"  Actual:   {actual_val}")
    print()
```

**Returns**: Tuple of (results_list, percentage_score)
- `results_list`: List of tuples (field, expected, actual, match_bool)
- `percentage_score`: Float between 0-100

## 🔄 Common Workflows

### Batch Processing Multiple Files

```python
import sverification
import os

def process_directory(pdf_directory):
    """Process all PDFs in a directory"""
    results = []
    
    for filename in os.listdir(pdf_directory):
        if filename.endswith('.pdf'):
            pdf_path = os.path.join(pdf_directory, filename)
            
            try:
                result = sverification.verify_statement_verbose(pdf_path)
                results.append({
                    'file': filename,
                    'brand': result['detected_brand'],
                    'score': result['verification_score']
                })
                print(f"✓ {filename}: {result['verification_score']:.1f}%")
            except Exception as e:
                print(f"✗ {filename}: Error - {e}")
    
    return results

# Process all PDFs in a folder
results = process_directory("./statements/")
```

### Custom Analysis

```python
import sverification

def analyze_statement_quality(pdf_path):
    """Analyze statement quality indicators"""
    metadata = sverification.extract_all(pdf_path)
    company = sverification.get_company_name(pdf_path)
    
    issues = []
    
    # Check for multiple EOF markers (potential tampering)
    if metadata['eof_markers'] > 1:
        issues.append("Multiple EOF markers detected")
    
    # Check for date inconsistencies
    if metadata['creationdate'] != metadata['moddate']:
        issues.append("Creation and modification dates differ")
    
    # Check for unknown institution
    if company == "unknown":
        issues.append("Institution not recognized")
    
    return {
        'company': company,
        'issues': issues,
        'metadata': metadata
    }

# Analyze a statement
analysis = analyze_statement_quality("statement.pdf")
print(f"Institution: {analysis['company']}")
if analysis['issues']:
    print("⚠️  Issues found:")
    for issue in analysis['issues']:
        print(f"  - {issue}")
else:
    print("✓ No issues detected")
```

## 🏦 Supported Institutions

Banks: ABSA, CRDB, DTB, Exim, NMB, NBC, TCB, UBA  
Mobile Money: Airtel, Tigo, Vodacom, Halotel, Selcom  
Others: Azam Pesa, PayMaart, and more...

## 🧪 Testing

```bash
# Run tests
pytest

# Run with coverage
pytest --cov=sverification
```

## 📄 License

Proprietary software licensed under Black Swan AI Global. See [LICENSE](LICENSE) for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Tausi-Africa/statement-verification",
    "name": "sverification",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pdf verification statements metadata financial",
    "author": "Alex Mkwizu @ Black Swan AI",
    "author_email": "alex@bsa.ai",
    "download_url": "https://files.pythonhosted.org/packages/5d/cd/1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385/sverification-0.1.1.tar.gz",
    "platform": null,
    "description": "# Statement Verification\n\nA Python package for verifying PDF statements from financial institutions. Extracts metadata, detects the issuing institution, and provides verification scores.\n\n## Installation\n\n```bash\n# From PyPI (recommended)\npip install sverification\n\n# Or from source\ngit clone https://github.com/Tausi-Africa/statement-verification.git\ncd statement-verification\npip install -e .\n```\n\n## Quick Start\n\n```bash\n# Command line usage\nverify-statement path/to/statement.pdf --brands statements_metadata.json\n```\n\n```python\n# Python API - Simple verification\nimport sverification\n\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\nprint(f\"Brand: {result['detected_brand']}, Score: {result['verification_score']:.1f}%\")\n```\n\n## \ud83d\udcda Function Reference\n\n### 1. `verify_statement_verbose()` - Complete Verification\n\n**Purpose**: Performs complete statement verification with detailed results.\n\n```python\nimport sverification\n\n# Basic usage\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\n\n# With custom brands file\nresult = sverification.verify_statement_verbose(\n    pdf_path=\"statement.pdf\",\n    brands_json_path=\"custom_brands.json\"\n)\n\n# Access results\nprint(f\"Detected Brand: {result['detected_brand']}\")\nprint(f\"Verification Score: {result['verification_score']:.1f}%\")\nprint(f\"Total Fields Checked: {result['total_fields']}\")\nprint(f\"Fields Matched: {result['matched_fields']}\")\n\n# Get detailed field results\nfor field in result['field_results']:\n    status = \"\u2713\" if field['match'] else \"\u2717\"\n    print(f\"[{status}] {field['field']}: {field['actual']} (expected: {field['expected']})\")\n```\n\n**Returns**: Dictionary with complete verification data\n- `detected_brand`: Institution name\n- `verification_score`: Percentage score (0-100)\n- `field_results`: List of field comparisons\n- `total_fields`: Number of fields checked\n- `matched_fields`: Number of matching fields\n- `summary`: Human-readable summary\n\n### 2. `print_verification_report()` - Formatted Output\n\n**Purpose**: Prints a formatted verification report.\n\n```python\nimport sverification\n\n# Get verification results\nresult = sverification.verify_statement_verbose(\"statement.pdf\")\n\n# Print formatted report (same as CLI output)\nsverification.print_verification_report(result)\n\n# Output example:\n# ========================================================================\n# PDF: statement.pdf\n# Detected brand: selcom\n# Template used: selcom\n# Fields checked: 5\n# Fields matched: 5\n# Verification score: 100.0%\n# ------------------------------------------------------------------------\n# Field Comparison (expected vs. actual):\n#   [\u2713] pdf_version      expected='1.4'  actual='1.4'\n#   [\u2713] creator          expected='Selcom'  actual='Selcom'\n# ========================================================================\n```\n\n### 3. `extract_all()` - PDF Metadata Extraction\n\n**Purpose**: Extracts comprehensive metadata from PDF files.\n\n```python\nimport sverification\n\n# Extract metadata\nmetadata = sverification.extract_all(\"statement.pdf\")\n\n# Access specific metadata\nprint(f\"PDF Version: {metadata['pdf_version']}\")\nprint(f\"Creator: {metadata['creator']}\")\nprint(f\"Producer: {metadata['producer']}\")\nprint(f\"Creation Date: {metadata['creationdate']}\")\nprint(f\"Modification Date: {metadata['moddate']}\")\nprint(f\"EOF Markers: {metadata['eof_markers']}\")\nprint(f\"PDF Versions: {metadata['pdf_versions']}\")\n\n# Check for potential issues\nif metadata['eof_markers'] > 1:\n    print(\"\u26a0\ufe0f  Multiple EOF markers detected\")\n\nif metadata['creationdate'] != metadata['moddate']:\n    print(\"\u26a0\ufe0f  Creation and modification dates differ\")\n```\n\n**Returns**: Dictionary with extracted metadata\n- `pdf_version`: PDF specification version\n- `creator`: Application that created the PDF\n- `producer`: Software that produced the PDF\n- `creationdate`: When PDF was created\n- `moddate`: When PDF was last modified\n- `eof_markers`: Number of EOF markers (security indicator)\n- `pdf_versions`: Number of PDF versions\n\n### 4. `get_company_name()` - Institution Detection\n\n**Purpose**: Automatically detects the financial institution from PDF content.\n\n```python\nimport sverification\n\n# Detect institution\ncompany = sverification.get_company_name(\"statement.pdf\")\nprint(f\"Detected Institution: {company}\")\n\n# Handle unknown institutions\nif company == \"unknown\":\n    print(\"\u26a0\ufe0f  Institution not recognized\")\n    print(\"Consider adding detection rules for this institution\")\n\n# Examples of detected institutions:\n# \"selcom\", \"vodacom\", \"airtel\", \"absa\", \"crdb\", \"nmb\", etc.\n```\n\n**Returns**: String with institution code\n- Returns standardized institution codes (e.g., \"selcom\", \"vodacom\")\n- Returns \"unknown\" if institution cannot be detected\n\n### 5. `load_brands()` - Template Management\n\n**Purpose**: Loads institution templates for comparison.\n\n```python\nimport sverification\n\n# Load default templates\nbrands = sverification.load_brands(\"statements_metadata.json\")\n\n# Check available institutions\nprint(\"Available institutions:\")\nfor brand_code, templates in brands.items():\n    print(f\"  - {brand_code}: {len(templates)} template(s)\")\n\n# Get template for specific institution\nselcom_templates = brands.get(\"selcom\", [])\nif selcom_templates:\n    template = selcom_templates[0]  # Use first template\n    print(f\"Expected PDF version for Selcom: {template.get('pdf_version')}\")\n    print(f\"Expected creator: {template.get('creator')}\")\n```\n\n**Returns**: Dictionary mapping institution codes to template lists\n\n### 6. `compare_fields()` - Field-by-Field Comparison\n\n**Purpose**: Compares extracted metadata against expected template.\n\n```python\nimport sverification\n\n# Extract metadata and load templates\nmetadata = sverification.extract_all(\"statement.pdf\")\nbrands = sverification.load_brands(\"statements_metadata.json\")\ncompany = sverification.get_company_name(\"statement.pdf\")\n\n# Get expected template\nexpected = brands.get(company.lower(), [{}])[0]\n\n# Compare fields\nresults, score = sverification.compare_fields(metadata, expected)\n\nprint(f\"Overall Score: {score:.1f}%\")\nprint(\"\\nField-by-field results:\")\n\nfor field_name, expected_val, actual_val, is_match in results:\n    status = \"\u2713 PASS\" if is_match else \"\u2717 FAIL\"\n    print(f\"{status} {field_name}\")\n    print(f\"  Expected: {expected_val}\")\n    print(f\"  Actual:   {actual_val}\")\n    print()\n```\n\n**Returns**: Tuple of (results_list, percentage_score)\n- `results_list`: List of tuples (field, expected, actual, match_bool)\n- `percentage_score`: Float between 0-100\n\n## \ud83d\udd04 Common Workflows\n\n### Batch Processing Multiple Files\n\n```python\nimport sverification\nimport os\n\ndef process_directory(pdf_directory):\n    \"\"\"Process all PDFs in a directory\"\"\"\n    results = []\n    \n    for filename in os.listdir(pdf_directory):\n        if filename.endswith('.pdf'):\n            pdf_path = os.path.join(pdf_directory, filename)\n            \n            try:\n                result = sverification.verify_statement_verbose(pdf_path)\n                results.append({\n                    'file': filename,\n                    'brand': result['detected_brand'],\n                    'score': result['verification_score']\n                })\n                print(f\"\u2713 {filename}: {result['verification_score']:.1f}%\")\n            except Exception as e:\n                print(f\"\u2717 {filename}: Error - {e}\")\n    \n    return results\n\n# Process all PDFs in a folder\nresults = process_directory(\"./statements/\")\n```\n\n### Custom Analysis\n\n```python\nimport sverification\n\ndef analyze_statement_quality(pdf_path):\n    \"\"\"Analyze statement quality indicators\"\"\"\n    metadata = sverification.extract_all(pdf_path)\n    company = sverification.get_company_name(pdf_path)\n    \n    issues = []\n    \n    # Check for multiple EOF markers (potential tampering)\n    if metadata['eof_markers'] > 1:\n        issues.append(\"Multiple EOF markers detected\")\n    \n    # Check for date inconsistencies\n    if metadata['creationdate'] != metadata['moddate']:\n        issues.append(\"Creation and modification dates differ\")\n    \n    # Check for unknown institution\n    if company == \"unknown\":\n        issues.append(\"Institution not recognized\")\n    \n    return {\n        'company': company,\n        'issues': issues,\n        'metadata': metadata\n    }\n\n# Analyze a statement\nanalysis = analyze_statement_quality(\"statement.pdf\")\nprint(f\"Institution: {analysis['company']}\")\nif analysis['issues']:\n    print(\"\u26a0\ufe0f  Issues found:\")\n    for issue in analysis['issues']:\n        print(f\"  - {issue}\")\nelse:\n    print(\"\u2713 No issues detected\")\n```\n\n## \ud83c\udfe6 Supported Institutions\n\nBanks: ABSA, CRDB, DTB, Exim, NMB, NBC, TCB, UBA  \nMobile Money: Airtel, Tigo, Vodacom, Halotel, Selcom  \nOthers: Azam Pesa, PayMaart, and more...\n\n## \ud83e\uddea Testing\n\n```bash\n# Run tests\npytest\n\n# Run with coverage\npytest --cov=sverification\n```\n\n## \ud83d\udcc4 License\n\nProprietary software licensed under Black Swan AI Global. See [LICENSE](LICENSE) for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for verifying PDF statements from Tanzanian and beyond institutions",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/Tausi-Africa/statement-verification/issues",
        "Homepage": "https://github.com/Tausi-Africa/statement-verification",
        "Repository": "https://github.com/Tausi-Africa/statement-verification"
    },
    "split_keywords": [
        "pdf",
        "verification",
        "statements",
        "metadata",
        "financial"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de0aef1d0d7c31aa80a06716d550e83b55e0c43a05e94bc871bec10b98fb690c",
                "md5": "59db002ea99f300bf1b344657f48bb7b",
                "sha256": "d2706c7d338d2091efeca829273464fe4878de4c13ca72cdea43f845de50d7f1"
            },
            "downloads": -1,
            "filename": "sverification-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "59db002ea99f300bf1b344657f48bb7b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20577,
            "upload_time": "2025-09-02T18:23:52",
            "upload_time_iso_8601": "2025-09-02T18:23:52.165884Z",
            "url": "https://files.pythonhosted.org/packages/de/0a/ef1d0d7c31aa80a06716d550e83b55e0c43a05e94bc871bec10b98fb690c/sverification-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5dcd1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385",
                "md5": "5a5564b9b4f46c9136a03cb5883b73c7",
                "sha256": "1c5127c73d0c1ba4cfa7277f37e067a729817acda832e7356e2c6d796a956d4d"
            },
            "downloads": -1,
            "filename": "sverification-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5a5564b9b4f46c9136a03cb5883b73c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20560,
            "upload_time": "2025-09-02T18:23:53",
            "upload_time_iso_8601": "2025-09-02T18:23:53.579032Z",
            "url": "https://files.pythonhosted.org/packages/5d/cd/1d33e66496e9c7b679d24b273f0f0875f17561a40b83adb342125af73385/sverification-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 18:23:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Tausi-Africa",
    "github_project": "statement-verification",
    "github_not_found": true,
    "lcname": "sverification"
}

Alex Mkwizu @ Black Swan AI