mailprobe-py


Namemailprobe-py JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/huntsberg/mailprobe-py
SummaryA Python implementation of advanced Bayesian email classification and filtering
upload_time2025-09-01 03:48:00
maintainerPeter Bowen
docs_urlNone
authorPeter Bowen
requires_python<4.0,>=3.9
licenseMIT
keywords email filter bayesian classification spam multi-category
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MailProbe-Py

[![CI](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml/badge.svg)](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/mailprobe-py.svg)](https://badge.fury.io/py/mailprobe-py)
[![Python versions](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/project/mailprobe-py/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python implementation of the MailProbe Bayesian email classifier, inspired by the original C++ MailProbe by Burton Computer Corporation.

## Overview

MailProbe-Py is a statistical email classifier that uses Bayesian analysis to identify spam emails. It learns from examples of spam and legitimate emails to build a database of word frequencies, then uses this information to score new emails.

## Features

- **Bayesian Analysis**: Uses statistical analysis of word frequencies to identify spam
- **Learning Capability**: Trains on your specific email patterns for personalized filtering
- **Multiple Email Formats**: Supports mbox, maildir, and individual email files
- **MIME Support**: Handles MIME attachments and encoding (quoted-printable, base64)
- **Phrase Analysis**: Analyzes both individual words and multi-word phrases
- **Header Analysis**: Separately analyzes different email headers for improved accuracy
- **Database Management**: Built-in database cleanup and maintenance commands
- **Command Line Interface**: Comprehensive CLI matching original MailProbe functionality

## Installation

### Using Poetry (Recommended)

```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install
```

### Using pip

```bash
pip install mailprobe-py
```

For development:

```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install --with dev
```

## Quick Start

### Command Line Usage

1. **Create a database**:
   ```bash
   poetry run mailprobe-py create-db
   ```

2. **Train on existing emails**:
   ```bash
   poetry run mailprobe-py good ~/mail/inbox
   poetry run mailprobe-py spam ~/mail/spam
   ```

3. **Score new emails**:
   ```bash
   poetry run mailprobe-py score < new_email.txt
   ```

### Python API Usage (Recommended for Integration)

```python
from mailprobe import MailProbeAPI

# Create email classifier
with MailProbeAPI() as spam_filter:
    # Train on messages
    spam_filter.train_good(["From: friend@example.com\nSubject: Meeting\n\nLet's meet tomorrow."])
    spam_filter.train_spam(["From: spammer@bad.com\nSubject: FREE MONEY\n\nClick here!"])
    
    # Classify new messages
    email_content = "From: unknown@example.com\nSubject: Question\n\nI have a question."
    is_spam = spam_filter.classify_text(email_content)
    print(f"Is spam: {is_spam}")
    
    # Get detailed results
    result = spam_filter.classify_text(email_content, return_details=True)
    print(f"Probability: {result.probability:.3f}")
```

### Convenience Functions

```python
from mailprobe import classify_email, get_spam_probability

# Quick classification
is_spam = classify_email("From: test@example.com\nSubject: Test\n\nHello")
probability = get_spam_probability("From: test@example.com\nSubject: Test\n\nHello")
```

### Multi-Category Classification

MailProbe-Py supports classification into multiple categories beyond just spam/not-spam:

```python
from mailprobe import MultiCategoryFilter, FolderBasedClassifier

# Multi-category classification
categories = ['work', 'personal', 'newsletters', 'spam']
with MultiCategoryFilter(categories) as classifier:
    # Train on different categories
    classifier.train_category('work', work_emails)
    classifier.train_category('personal', personal_emails)
    
    # Classify into categories
    result = classifier.classify(email_content)
    print(f"Category: {result.category}, Confidence: {result.confidence:.3f}")

# Folder-based classification (auto-discovers categories from folders)
with FolderBasedClassifier('emails/') as classifier:
    classifier.train_from_folders()  # Train from folder structure
    result = classifier.classify(email_content)
    print(f"Should go to folder: {result.category}")
```

## Commands

- `create-db` - Create a new spam database
- `train` - Score and selectively train on messages
- `receive` - Score and always train on messages  
- `score` - Score messages without updating database
- `good` - Mark messages as non-spam
- `spam` - Mark messages as spam
- `remove` - Remove messages from database
- `dump` - Export database contents
- `cleanup` - Remove old/rare terms from database
- `help` - Show detailed help for commands

## Configuration

MailProbe-Py stores its database in `~/.mailprobe-py/` by default. You can change this with the `-d` option:

```bash
poetry run mailprobe-py -d /path/to/database score < email.txt
```

## Integration with Mail Systems

### Procmail Example

Add to your `.procmailrc`:

```
:0 fw
| poetry run mailprobe-py receive

:0:
* ^X-MailProbe: SPAM
spam/
```

### Postfix Example

Configure as a content filter in Postfix to automatically score incoming mail.

## Accuracy

Like the original MailProbe, this implementation typically achieves:
- 99%+ spam detection rate
- Very low false positive rate (< 0.1%)
- Improves with training on your specific email patterns

## Testing

MailProbe-Py includes a comprehensive test suite with **97 tests** and **81% code coverage**.

### Running Tests

```bash
# Run all tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=src/mailprobe --cov-report=html

# Run specific test categories
poetry run pytest tests/test_api.py      # API tests
poetry run pytest tests/test_cli.py      # CLI tests
poetry run pytest tests/test_filter.py   # Core filtering tests

# Use the convenient test runner
python run_tests.py quick      # Quick test run
python run_tests.py coverage   # With coverage report
python run_tests.py html       # Generate HTML report
```

### Test Coverage

- **API Tests**: High-level object-oriented interface
- **CLI Tests**: Command-line interface functionality  
- **Core Tests**: Spam filtering, database, tokenization
- **Integration Tests**: End-to-end workflows
- **Error Handling**: Edge cases and invalid inputs

All tests pass successfully and the library is ready for production use.

## Documentation

- **[USAGE.md](USAGE.md)** - Comprehensive usage guide with examples
- **[OO_API_GUIDE.md](OO_API_GUIDE.md)** - Object-oriented API documentation
- **[MULTI_CATEGORY_GUIDE.md](MULTI_CATEGORY_GUIDE.md)** - Multi-category classification guide
- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and guidelines
- **[TEST_SUMMARY.md](TEST_SUMMARY.md)** - Detailed test documentation
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and changes

## License

MIT License - see LICENSE file for details.

## Acknowledgments

Based on the original MailProbe by Burton Computer Corporation, inspired by Paul Graham's "A Plan for Spam" article.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/huntsberg/mailprobe-py",
    "name": "mailprobe-py",
    "maintainer": "Peter Bowen",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "peter-mailprobe@bowenfamily.org",
    "keywords": "email, filter, bayesian, classification, spam, multi-category",
    "author": "Peter Bowen",
    "author_email": "peter-mailprobe@bowenfamily.org",
    "download_url": "https://files.pythonhosted.org/packages/ca/5f/09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9/mailprobe_py-0.1.0.tar.gz",
    "platform": null,
    "description": "# MailProbe-Py\n\n[![CI](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml/badge.svg)](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml)\n[![PyPI version](https://badge.fury.io/py/mailprobe-py.svg)](https://badge.fury.io/py/mailprobe-py)\n[![Python versions](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/project/mailprobe-py/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA Python implementation of the MailProbe Bayesian email classifier, inspired by the original C++ MailProbe by Burton Computer Corporation.\n\n## Overview\n\nMailProbe-Py is a statistical email classifier that uses Bayesian analysis to identify spam emails. It learns from examples of spam and legitimate emails to build a database of word frequencies, then uses this information to score new emails.\n\n## Features\n\n- **Bayesian Analysis**: Uses statistical analysis of word frequencies to identify spam\n- **Learning Capability**: Trains on your specific email patterns for personalized filtering\n- **Multiple Email Formats**: Supports mbox, maildir, and individual email files\n- **MIME Support**: Handles MIME attachments and encoding (quoted-printable, base64)\n- **Phrase Analysis**: Analyzes both individual words and multi-word phrases\n- **Header Analysis**: Separately analyzes different email headers for improved accuracy\n- **Database Management**: Built-in database cleanup and maintenance commands\n- **Command Line Interface**: Comprehensive CLI matching original MailProbe functionality\n\n## Installation\n\n### Using Poetry (Recommended)\n\n```bash\ngit clone https://github.com/huntsberg/mailprobe-py\ncd mailprobe-py\npoetry install\n```\n\n### Using pip\n\n```bash\npip install mailprobe-py\n```\n\nFor development:\n\n```bash\ngit clone https://github.com/huntsberg/mailprobe-py\ncd mailprobe-py\npoetry install --with dev\n```\n\n## Quick Start\n\n### Command Line Usage\n\n1. **Create a database**:\n   ```bash\n   poetry run mailprobe-py create-db\n   ```\n\n2. **Train on existing emails**:\n   ```bash\n   poetry run mailprobe-py good ~/mail/inbox\n   poetry run mailprobe-py spam ~/mail/spam\n   ```\n\n3. **Score new emails**:\n   ```bash\n   poetry run mailprobe-py score < new_email.txt\n   ```\n\n### Python API Usage (Recommended for Integration)\n\n```python\nfrom mailprobe import MailProbeAPI\n\n# Create email classifier\nwith MailProbeAPI() as spam_filter:\n    # Train on messages\n    spam_filter.train_good([\"From: friend@example.com\\nSubject: Meeting\\n\\nLet's meet tomorrow.\"])\n    spam_filter.train_spam([\"From: spammer@bad.com\\nSubject: FREE MONEY\\n\\nClick here!\"])\n    \n    # Classify new messages\n    email_content = \"From: unknown@example.com\\nSubject: Question\\n\\nI have a question.\"\n    is_spam = spam_filter.classify_text(email_content)\n    print(f\"Is spam: {is_spam}\")\n    \n    # Get detailed results\n    result = spam_filter.classify_text(email_content, return_details=True)\n    print(f\"Probability: {result.probability:.3f}\")\n```\n\n### Convenience Functions\n\n```python\nfrom mailprobe import classify_email, get_spam_probability\n\n# Quick classification\nis_spam = classify_email(\"From: test@example.com\\nSubject: Test\\n\\nHello\")\nprobability = get_spam_probability(\"From: test@example.com\\nSubject: Test\\n\\nHello\")\n```\n\n### Multi-Category Classification\n\nMailProbe-Py supports classification into multiple categories beyond just spam/not-spam:\n\n```python\nfrom mailprobe import MultiCategoryFilter, FolderBasedClassifier\n\n# Multi-category classification\ncategories = ['work', 'personal', 'newsletters', 'spam']\nwith MultiCategoryFilter(categories) as classifier:\n    # Train on different categories\n    classifier.train_category('work', work_emails)\n    classifier.train_category('personal', personal_emails)\n    \n    # Classify into categories\n    result = classifier.classify(email_content)\n    print(f\"Category: {result.category}, Confidence: {result.confidence:.3f}\")\n\n# Folder-based classification (auto-discovers categories from folders)\nwith FolderBasedClassifier('emails/') as classifier:\n    classifier.train_from_folders()  # Train from folder structure\n    result = classifier.classify(email_content)\n    print(f\"Should go to folder: {result.category}\")\n```\n\n## Commands\n\n- `create-db` - Create a new spam database\n- `train` - Score and selectively train on messages\n- `receive` - Score and always train on messages  \n- `score` - Score messages without updating database\n- `good` - Mark messages as non-spam\n- `spam` - Mark messages as spam\n- `remove` - Remove messages from database\n- `dump` - Export database contents\n- `cleanup` - Remove old/rare terms from database\n- `help` - Show detailed help for commands\n\n## Configuration\n\nMailProbe-Py stores its database in `~/.mailprobe-py/` by default. You can change this with the `-d` option:\n\n```bash\npoetry run mailprobe-py -d /path/to/database score < email.txt\n```\n\n## Integration with Mail Systems\n\n### Procmail Example\n\nAdd to your `.procmailrc`:\n\n```\n:0 fw\n| poetry run mailprobe-py receive\n\n:0:\n* ^X-MailProbe: SPAM\nspam/\n```\n\n### Postfix Example\n\nConfigure as a content filter in Postfix to automatically score incoming mail.\n\n## Accuracy\n\nLike the original MailProbe, this implementation typically achieves:\n- 99%+ spam detection rate\n- Very low false positive rate (< 0.1%)\n- Improves with training on your specific email patterns\n\n## Testing\n\nMailProbe-Py includes a comprehensive test suite with **97 tests** and **81% code coverage**.\n\n### Running Tests\n\n```bash\n# Run all tests\npoetry run pytest\n\n# Run tests with coverage\npoetry run pytest --cov=src/mailprobe --cov-report=html\n\n# Run specific test categories\npoetry run pytest tests/test_api.py      # API tests\npoetry run pytest tests/test_cli.py      # CLI tests\npoetry run pytest tests/test_filter.py   # Core filtering tests\n\n# Use the convenient test runner\npython run_tests.py quick      # Quick test run\npython run_tests.py coverage   # With coverage report\npython run_tests.py html       # Generate HTML report\n```\n\n### Test Coverage\n\n- **API Tests**: High-level object-oriented interface\n- **CLI Tests**: Command-line interface functionality  \n- **Core Tests**: Spam filtering, database, tokenization\n- **Integration Tests**: End-to-end workflows\n- **Error Handling**: Edge cases and invalid inputs\n\nAll tests pass successfully and the library is ready for production use.\n\n## Documentation\n\n- **[USAGE.md](USAGE.md)** - Comprehensive usage guide with examples\n- **[OO_API_GUIDE.md](OO_API_GUIDE.md)** - Object-oriented API documentation\n- **[MULTI_CATEGORY_GUIDE.md](MULTI_CATEGORY_GUIDE.md)** - Multi-category classification guide\n- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and guidelines\n- **[TEST_SUMMARY.md](TEST_SUMMARY.md)** - Detailed test documentation\n- **[CHANGELOG.md](CHANGELOG.md)** - Version history and changes\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Acknowledgments\n\nBased on the original MailProbe by Burton Computer Corporation, inspired by Paul Graham's \"A Plan for Spam\" article.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python implementation of advanced Bayesian email classification and filtering",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/huntsberg/mailprobe-py/issues",
        "Changelog": "https://github.com/huntsberg/mailprobe-py/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/huntsberg/mailprobe-py/blob/main/README.md",
        "Homepage": "https://github.com/huntsberg/mailprobe-py",
        "Repository": "https://github.com/huntsberg/mailprobe-py"
    },
    "split_keywords": [
        "email",
        " filter",
        " bayesian",
        " classification",
        " spam",
        " multi-category"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cdc76ee6738d85e349ee06390728ffa9812724f3dbbf13c3e3bd4222c66641d6",
                "md5": "4976edd6d37fa7cb343ecb8a545c45e8",
                "sha256": "2ad69e61fedaaef02ddd5af9f944dba94cc633b1df8ab7b0506442803816b2c8"
            },
            "downloads": -1,
            "filename": "mailprobe_py-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4976edd6d37fa7cb343ecb8a545c45e8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 39106,
            "upload_time": "2025-09-01T03:47:59",
            "upload_time_iso_8601": "2025-09-01T03:47:59.342399Z",
            "url": "https://files.pythonhosted.org/packages/cd/c7/6ee6738d85e349ee06390728ffa9812724f3dbbf13c3e3bd4222c66641d6/mailprobe_py-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca5f09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9",
                "md5": "b1795f14d0f82af9f92c4d0a2a31a667",
                "sha256": "d5da11d9b4b9e0a7b2153277ddbd6c8d87569b05b5e702a821ef04cb190ede56"
            },
            "downloads": -1,
            "filename": "mailprobe_py-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b1795f14d0f82af9f92c4d0a2a31a667",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 47705,
            "upload_time": "2025-09-01T03:48:00",
            "upload_time_iso_8601": "2025-09-01T03:48:00.800310Z",
            "url": "https://files.pythonhosted.org/packages/ca/5f/09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9/mailprobe_py-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-01 03:48:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huntsberg",
    "github_project": "mailprobe-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mailprobe-py"
}
        
Elapsed time: 1.69701s