bank-statement-separator


Namebank-statement-separator JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryAI-powered tool for separating multi-statement PDF files using LangChain and LangGraph
upload_time2025-09-08 05:16:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords pdf ai langchain bank-statement document-processing automation financial machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bank Statement Separator

[![Documentation](https://img.shields.io/badge/docs-online-blue)](https://madeinoz67.github.io/bank-statement-separator/)
[![Tests](https://img.shields.io/badge/tests-37%2F37%20passing-brightgreen)](https://github.com/madeinoz67/bank-statement-separator/actions)
[![PyPI](https://img.shields.io/pypi/v/bank-statement-separator)](https://pypi.org/project/bank-statement-separator/)
[![Python](https://img.shields.io/pypi/pyversions/bank-statement-separator)](https://pypi.org/project/bank-statement-separator/)

An AI-powered tool that automatically processes PDF files containing multiple bank statements and separates them into individual files. Built with LangChain and LangGraph for robust stateful AI processing.

## ๐Ÿš€ Features

- **AI-Powered Analysis**: Uses advanced language models to detect statement boundaries
- **Multiple LLM Support**: Compatible with OpenAI GPT models and Ollama local models
- **PDF Processing**: Efficient document manipulation using PyMuPDF
- **Metadata Extraction**: Automatically extracts account numbers, dates, and bank information
- **File Organization**: Generates meaningful filenames following configurable patterns
- **Error Handling**: Comprehensive logging and audit trails
- **Security Controls**: Built-in safeguards for production use
- **Paperless Integration**: Optional integration with Paperless-ngx for document management

## ๐Ÿ“‹ Requirements

- Python 3.9+
- OpenAI API key (for LLM functionality)
- UV package manager

## ๐Ÿ›  Installation

### 1. Clone the Repository

```bash
git clone https://github.com/madeinoz67/bank-statement-separator.git
cd bank-statement-separator
```

### 2. Install Dependencies

```bash
# Install with UV
uv sync

# Install with dev dependencies
uv sync --group dev
```

### 3. Configure Environment

Copy the example environment file and configure your settings:

```bash
cp .env.example .env
```

Edit `.env` to set your OpenAI API key:
```bash
OPENAI_API_KEY=your_api_key_here
```

## ๐Ÿ“– Usage

### Basic Usage

```bash
# Process a single PDF file
uv run python -m src.bank_statement_separator.main input.pdf

# Specify output directory
uv run python -m src.bank_statement_separator.main input.pdf -o ./output

# Use verbose logging
uv run python -m src.bank_statement_separator.main input.pdf --verbose

# Dry run mode (no files written)
uv run python -m src.bank_statement_separator.main input.pdf --dry-run
```

### Advanced Options

```bash
# Specify LLM model
uv run python -m src.bank_statement_separator.main input.pdf --model gpt-4o

# Set custom processing limits
uv run python -m src.bank_statement_separator.main input.pdf --max-pages 50

# Enable debug mode
uv run python -m src.bank_statement_separator.main input.pdf --debug
```

### Configuration

The application uses environment variables for configuration. Key settings include:

- `OPENAI_API_KEY`: Your OpenAI API key
- `OLLAMA_BASE_URL`: Ollama server URL (for local models)
- `LOG_LEVEL`: Logging verbosity (DEBUG, INFO, WARNING, ERROR)
- `MAX_PAGES_PER_STATEMENT`: Processing limits
- `OUTPUT_DIR`: Default output directory

See [Configuration Guide](docs/getting-started/configuration.md) for complete details.

## ๐Ÿ— Architecture

The system consists of several key components:

- **Workflow Engine**: LangGraph-based state machine for processing steps
- **LLM Analyzer**: AI-powered boundary detection and metadata extraction
- **PDF Processor**: Document manipulation and text extraction
- **Error Handler**: Comprehensive error management and recovery
- **Rate Limiter**: API usage controls and backoff mechanisms

### Processing Pipeline

1. **PDF Ingestion**: Load and validate input documents
2. **Document Analysis**: Extract text and structural information
3. **Statement Detection**: AI boundary detection using LLM analysis
4. **Metadata Extraction**: Account and period information extraction
5. **PDF Generation**: Create individual statement files
6. **File Organization**: Apply naming conventions and organization

## ๐Ÿงช Testing

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src

# Run specific test categories
uv run pytest tests/unit/
uv run pytest tests/integration/
```

## ๐Ÿค Contributing

We welcome contributions! Please follow these guidelines:

### Development Setup

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/your-feature`
3. Install development dependencies: `uv sync --group dev`
4. Make your changes
5. Run tests: `uv run pytest`
6. Format code: `uv run ruff format .`
7. Check linting: `uv run ruff check .`
8. Commit your changes with a descriptive message
9. Push to your fork and create a pull request

### Code Quality

- Follow PEP 8 style guidelines
- Use type hints for all function parameters and return values
- Write comprehensive docstrings for public APIs
- Add tests for new features and bug fixes
- Keep functions focused and small
- Use descriptive variable and function names

### Pull Request Process

1. Ensure all tests pass
2. Update documentation if needed
3. Add appropriate commit trailers (see below)
4. Request review from maintainers

### Commit Guidelines

For commits fixing bugs or adding features based on user reports:
```bash
git commit --trailer "Reported-by:<name>"
```

For commits related to a GitHub issue:
```bash
git commit --trailer "Github-Issue:#<number>"
```

## ๐Ÿ“š Documentation

๐Ÿ“– **[Read the full documentation online](https://madeinoz67.github.io/bank-statement-separator/)**

Complete documentation is available in the `docs/` directory:

- [Getting Started](docs/getting-started/)
- [User Guide](docs/user-guide/)
- [Developer Guide](docs/developer-guide/)
- [API Reference](docs/reference/)
- [Architecture](docs/architecture/)

Build documentation locally:
```bash
uv run mkdocs serve
```

## ๐Ÿ“ฆ Dependencies

### Core Dependencies

- `langchain`: LLM integration framework
- `langgraph`: Stateful workflow orchestration
- `pymupdf`: PDF processing
- `pydantic`: Data validation
- `rich`: Terminal formatting
- `python-dotenv`: Environment management

### Development Dependencies

- `pytest`: Testing framework
- `ruff`: Code formatting and linting
- `pyright`: Type checking
- `mkdocs`: Documentation generation

See `pyproject.toml` for complete dependency list.

## ๐Ÿ”’ Security

- API keys are managed through environment variables
- Input validation on all user-provided data
- Rate limiting for external API calls
- Comprehensive logging for audit trails
- No sensitive data stored in application logs

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- Built with [LangChain](https://langchain.com/) and [LangGraph](https://langchain-ai.github.io/langgraph/)
- PDF processing powered by [PyMuPDF](https://pymupdf.readthedocs.io/)
- Inspired by the need for automated document processing in financial workflows

## ๐Ÿ› Issues & Support

- Report bugs via [GitHub Issues](https://github.com/madeinoz67/bank-statement-separator/issues)
- Check [Troubleshooting Guide](docs/reference/troubleshooting.md) for common issues
- Review [Known Issues](docs/known_issues/) for current limitations

---

**Note**: This tool requires an OpenAI API key for AI functionality. Falls back to pattern matching if LLM is unavailable.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bank-statement-separator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Stephen Eaton <seaton@strobotics.com.au>",
    "keywords": "pdf, ai, langchain, bank-statement, document-processing, automation, financial, machine-learning",
    "author": null,
    "author_email": "Stephen Eaton <seaton@strobotics.com.au>",
    "download_url": "https://files.pythonhosted.org/packages/98/21/e3dadc96d774f55b22ef52e4de0e8b756de899a1bf82ee281ae03b3fa937/bank_statement_separator-0.2.0.tar.gz",
    "platform": null,
    "description": "# Bank Statement Separator\n\n[![Documentation](https://img.shields.io/badge/docs-online-blue)](https://madeinoz67.github.io/bank-statement-separator/)\n[![Tests](https://img.shields.io/badge/tests-37%2F37%20passing-brightgreen)](https://github.com/madeinoz67/bank-statement-separator/actions)\n[![PyPI](https://img.shields.io/pypi/v/bank-statement-separator)](https://pypi.org/project/bank-statement-separator/)\n[![Python](https://img.shields.io/pypi/pyversions/bank-statement-separator)](https://pypi.org/project/bank-statement-separator/)\n\nAn AI-powered tool that automatically processes PDF files containing multiple bank statements and separates them into individual files. Built with LangChain and LangGraph for robust stateful AI processing.\n\n## \ud83d\ude80 Features\n\n- **AI-Powered Analysis**: Uses advanced language models to detect statement boundaries\n- **Multiple LLM Support**: Compatible with OpenAI GPT models and Ollama local models\n- **PDF Processing**: Efficient document manipulation using PyMuPDF\n- **Metadata Extraction**: Automatically extracts account numbers, dates, and bank information\n- **File Organization**: Generates meaningful filenames following configurable patterns\n- **Error Handling**: Comprehensive logging and audit trails\n- **Security Controls**: Built-in safeguards for production use\n- **Paperless Integration**: Optional integration with Paperless-ngx for document management\n\n## \ud83d\udccb Requirements\n\n- Python 3.9+\n- OpenAI API key (for LLM functionality)\n- UV package manager\n\n## \ud83d\udee0 Installation\n\n### 1. Clone the Repository\n\n```bash\ngit clone https://github.com/madeinoz67/bank-statement-separator.git\ncd bank-statement-separator\n```\n\n### 2. Install Dependencies\n\n```bash\n# Install with UV\nuv sync\n\n# Install with dev dependencies\nuv sync --group dev\n```\n\n### 3. Configure Environment\n\nCopy the example environment file and configure your settings:\n\n```bash\ncp .env.example .env\n```\n\nEdit `.env` to set your OpenAI API key:\n```bash\nOPENAI_API_KEY=your_api_key_here\n```\n\n## \ud83d\udcd6 Usage\n\n### Basic Usage\n\n```bash\n# Process a single PDF file\nuv run python -m src.bank_statement_separator.main input.pdf\n\n# Specify output directory\nuv run python -m src.bank_statement_separator.main input.pdf -o ./output\n\n# Use verbose logging\nuv run python -m src.bank_statement_separator.main input.pdf --verbose\n\n# Dry run mode (no files written)\nuv run python -m src.bank_statement_separator.main input.pdf --dry-run\n```\n\n### Advanced Options\n\n```bash\n# Specify LLM model\nuv run python -m src.bank_statement_separator.main input.pdf --model gpt-4o\n\n# Set custom processing limits\nuv run python -m src.bank_statement_separator.main input.pdf --max-pages 50\n\n# Enable debug mode\nuv run python -m src.bank_statement_separator.main input.pdf --debug\n```\n\n### Configuration\n\nThe application uses environment variables for configuration. Key settings include:\n\n- `OPENAI_API_KEY`: Your OpenAI API key\n- `OLLAMA_BASE_URL`: Ollama server URL (for local models)\n- `LOG_LEVEL`: Logging verbosity (DEBUG, INFO, WARNING, ERROR)\n- `MAX_PAGES_PER_STATEMENT`: Processing limits\n- `OUTPUT_DIR`: Default output directory\n\nSee [Configuration Guide](docs/getting-started/configuration.md) for complete details.\n\n## \ud83c\udfd7 Architecture\n\nThe system consists of several key components:\n\n- **Workflow Engine**: LangGraph-based state machine for processing steps\n- **LLM Analyzer**: AI-powered boundary detection and metadata extraction\n- **PDF Processor**: Document manipulation and text extraction\n- **Error Handler**: Comprehensive error management and recovery\n- **Rate Limiter**: API usage controls and backoff mechanisms\n\n### Processing Pipeline\n\n1. **PDF Ingestion**: Load and validate input documents\n2. **Document Analysis**: Extract text and structural information\n3. **Statement Detection**: AI boundary detection using LLM analysis\n4. **Metadata Extraction**: Account and period information extraction\n5. **PDF Generation**: Create individual statement files\n6. **File Organization**: Apply naming conventions and organization\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=src\n\n# Run specific test categories\nuv run pytest tests/unit/\nuv run pytest tests/integration/\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please follow these guidelines:\n\n### Development Setup\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/your-feature`\n3. Install development dependencies: `uv sync --group dev`\n4. Make your changes\n5. Run tests: `uv run pytest`\n6. Format code: `uv run ruff format .`\n7. Check linting: `uv run ruff check .`\n8. Commit your changes with a descriptive message\n9. Push to your fork and create a pull request\n\n### Code Quality\n\n- Follow PEP 8 style guidelines\n- Use type hints for all function parameters and return values\n- Write comprehensive docstrings for public APIs\n- Add tests for new features and bug fixes\n- Keep functions focused and small\n- Use descriptive variable and function names\n\n### Pull Request Process\n\n1. Ensure all tests pass\n2. Update documentation if needed\n3. Add appropriate commit trailers (see below)\n4. Request review from maintainers\n\n### Commit Guidelines\n\nFor commits fixing bugs or adding features based on user reports:\n```bash\ngit commit --trailer \"Reported-by:<name>\"\n```\n\nFor commits related to a GitHub issue:\n```bash\ngit commit --trailer \"Github-Issue:#<number>\"\n```\n\n## \ud83d\udcda Documentation\n\n\ud83d\udcd6 **[Read the full documentation online](https://madeinoz67.github.io/bank-statement-separator/)**\n\nComplete documentation is available in the `docs/` directory:\n\n- [Getting Started](docs/getting-started/)\n- [User Guide](docs/user-guide/)\n- [Developer Guide](docs/developer-guide/)\n- [API Reference](docs/reference/)\n- [Architecture](docs/architecture/)\n\nBuild documentation locally:\n```bash\nuv run mkdocs serve\n```\n\n## \ud83d\udce6 Dependencies\n\n### Core Dependencies\n\n- `langchain`: LLM integration framework\n- `langgraph`: Stateful workflow orchestration\n- `pymupdf`: PDF processing\n- `pydantic`: Data validation\n- `rich`: Terminal formatting\n- `python-dotenv`: Environment management\n\n### Development Dependencies\n\n- `pytest`: Testing framework\n- `ruff`: Code formatting and linting\n- `pyright`: Type checking\n- `mkdocs`: Documentation generation\n\nSee `pyproject.toml` for complete dependency list.\n\n## \ud83d\udd12 Security\n\n- API keys are managed through environment variables\n- Input validation on all user-provided data\n- Rate limiting for external API calls\n- Comprehensive logging for audit trails\n- No sensitive data stored in application logs\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Built with [LangChain](https://langchain.com/) and [LangGraph](https://langchain-ai.github.io/langgraph/)\n- PDF processing powered by [PyMuPDF](https://pymupdf.readthedocs.io/)\n- Inspired by the need for automated document processing in financial workflows\n\n## \ud83d\udc1b Issues & Support\n\n- Report bugs via [GitHub Issues](https://github.com/madeinoz67/bank-statement-separator/issues)\n- Check [Troubleshooting Guide](docs/reference/troubleshooting.md) for common issues\n- Review [Known Issues](docs/known_issues/) for current limitations\n\n---\n\n**Note**: This tool requires an OpenAI API key for AI functionality. Falls back to pattern matching if LLM is unavailable.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "AI-powered tool for separating multi-statement PDF files using LangChain and LangGraph",
    "version": "0.2.0",
    "project_urls": {
        "Changelog": "https://github.com/madeinoz67/bank-statement-separator/blob/main/docs/release_notes/CHANGELOG.md",
        "Documentation": "https://madeinoz67.github.io/bank-statement-separator/",
        "Homepage": "https://github.com/madeinoz67/bank-statement-separator",
        "Issues": "https://github.com/madeinoz67/bank-statement-separator/issues",
        "Repository": "https://github.com/madeinoz67/bank-statement-separator"
    },
    "split_keywords": [
        "pdf",
        " ai",
        " langchain",
        " bank-statement",
        " document-processing",
        " automation",
        " financial",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de13fdfe85f3da2e03847f91047dc3021abe6c8ce0acdc5c33fef0624ae7e590",
                "md5": "57605f8e871b0d294d4de93fc29830b2",
                "sha256": "c1fea9bdeb854e7857a8b268883e39ff92635137bc3c7cf4371007d9cde9f372"
            },
            "downloads": -1,
            "filename": "bank_statement_separator-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "57605f8e871b0d294d4de93fc29830b2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 29663,
            "upload_time": "2025-09-08T05:16:41",
            "upload_time_iso_8601": "2025-09-08T05:16:41.655559Z",
            "url": "https://files.pythonhosted.org/packages/de/13/fdfe85f3da2e03847f91047dc3021abe6c8ce0acdc5c33fef0624ae7e590/bank_statement_separator-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9821e3dadc96d774f55b22ef52e4de0e8b756de899a1bf82ee281ae03b3fa937",
                "md5": "124bb5b0cdb4890fa1a5bc554bcca63e",
                "sha256": "60d654141ee247b4b96915e6c106f3083a76eb8f73894828610d4d186f5cfa07"
            },
            "downloads": -1,
            "filename": "bank_statement_separator-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "124bb5b0cdb4890fa1a5bc554bcca63e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 32077,
            "upload_time": "2025-09-08T05:16:42",
            "upload_time_iso_8601": "2025-09-08T05:16:42.981756Z",
            "url": "https://files.pythonhosted.org/packages/98/21/e3dadc96d774f55b22ef52e4de0e8b756de899a1bf82ee281ae03b3fa937/bank_statement_separator-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 05:16:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "madeinoz67",
    "github_project": "bank-statement-separator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bank-statement-separator"
}
        
Elapsed time: 0.68939s