# MailProbe-Py
[](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml)
[](https://badge.fury.io/py/mailprobe-py)
[](https://pypi.org/project/mailprobe-py/)
[](https://opensource.org/licenses/MIT)
A Python implementation of the MailProbe Bayesian email classifier, inspired by the original C++ MailProbe by Burton Computer Corporation.
## Overview
MailProbe-Py is a statistical email classifier that uses Bayesian analysis to identify spam emails. It learns from examples of spam and legitimate emails to build a database of word frequencies, then uses this information to score new emails.
## Features
- **Bayesian Analysis**: Uses statistical analysis of word frequencies to identify spam
- **Learning Capability**: Trains on your specific email patterns for personalized filtering
- **Multiple Email Formats**: Supports mbox, maildir, and individual email files
- **MIME Support**: Handles MIME attachments and encoding (quoted-printable, base64)
- **Phrase Analysis**: Analyzes both individual words and multi-word phrases
- **Header Analysis**: Separately analyzes different email headers for improved accuracy
- **Database Management**: Built-in database cleanup and maintenance commands
- **Command Line Interface**: Comprehensive CLI matching original MailProbe functionality
## Installation
### Using Poetry (Recommended)
```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install
```
### Using pip
```bash
pip install mailprobe-py
```
For development:
```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install --with dev
```
## Quick Start
### Command Line Usage
1. **Create a database**:
```bash
poetry run mailprobe-py create-db
```
2. **Train on existing emails**:
```bash
poetry run mailprobe-py good ~/mail/inbox
poetry run mailprobe-py spam ~/mail/spam
```
3. **Score new emails**:
```bash
poetry run mailprobe-py score < new_email.txt
```
### Python API Usage (Recommended for Integration)
```python
from mailprobe import MailProbeAPI
# Create email classifier
with MailProbeAPI() as spam_filter:
# Train on messages
spam_filter.train_good(["From: friend@example.com\nSubject: Meeting\n\nLet's meet tomorrow."])
spam_filter.train_spam(["From: spammer@bad.com\nSubject: FREE MONEY\n\nClick here!"])
# Classify new messages
email_content = "From: unknown@example.com\nSubject: Question\n\nI have a question."
is_spam = spam_filter.classify_text(email_content)
print(f"Is spam: {is_spam}")
# Get detailed results
result = spam_filter.classify_text(email_content, return_details=True)
print(f"Probability: {result.probability:.3f}")
```
### Convenience Functions
```python
from mailprobe import classify_email, get_spam_probability
# Quick classification
is_spam = classify_email("From: test@example.com\nSubject: Test\n\nHello")
probability = get_spam_probability("From: test@example.com\nSubject: Test\n\nHello")
```
### Multi-Category Classification
MailProbe-Py supports classification into multiple categories beyond just spam/not-spam:
```python
from mailprobe import MultiCategoryFilter, FolderBasedClassifier
# Multi-category classification
categories = ['work', 'personal', 'newsletters', 'spam']
with MultiCategoryFilter(categories) as classifier:
# Train on different categories
classifier.train_category('work', work_emails)
classifier.train_category('personal', personal_emails)
# Classify into categories
result = classifier.classify(email_content)
print(f"Category: {result.category}, Confidence: {result.confidence:.3f}")
# Folder-based classification (auto-discovers categories from folders)
with FolderBasedClassifier('emails/') as classifier:
classifier.train_from_folders() # Train from folder structure
result = classifier.classify(email_content)
print(f"Should go to folder: {result.category}")
```
## Commands
- `create-db` - Create a new spam database
- `train` - Score and selectively train on messages
- `receive` - Score and always train on messages
- `score` - Score messages without updating database
- `good` - Mark messages as non-spam
- `spam` - Mark messages as spam
- `remove` - Remove messages from database
- `dump` - Export database contents
- `cleanup` - Remove old/rare terms from database
- `help` - Show detailed help for commands
## Configuration
MailProbe-Py stores its database in `~/.mailprobe-py/` by default. You can change this with the `-d` option:
```bash
poetry run mailprobe-py -d /path/to/database score < email.txt
```
## Integration with Mail Systems
### Procmail Example
Add to your `.procmailrc`:
```
:0 fw
| poetry run mailprobe-py receive
:0:
* ^X-MailProbe: SPAM
spam/
```
### Postfix Example
Configure as a content filter in Postfix to automatically score incoming mail.
## Accuracy
Like the original MailProbe, this implementation typically achieves:
- 99%+ spam detection rate
- Very low false positive rate (< 0.1%)
- Improves with training on your specific email patterns
## Testing
MailProbe-Py includes a comprehensive test suite with **97 tests** and **81% code coverage**.
### Running Tests
```bash
# Run all tests
poetry run pytest
# Run tests with coverage
poetry run pytest --cov=src/mailprobe --cov-report=html
# Run specific test categories
poetry run pytest tests/test_api.py # API tests
poetry run pytest tests/test_cli.py # CLI tests
poetry run pytest tests/test_filter.py # Core filtering tests
# Use the convenient test runner
python run_tests.py quick # Quick test run
python run_tests.py coverage # With coverage report
python run_tests.py html # Generate HTML report
```
### Test Coverage
- **API Tests**: High-level object-oriented interface
- **CLI Tests**: Command-line interface functionality
- **Core Tests**: Spam filtering, database, tokenization
- **Integration Tests**: End-to-end workflows
- **Error Handling**: Edge cases and invalid inputs
All tests pass successfully and the library is ready for production use.
## Documentation
- **[USAGE.md](USAGE.md)** - Comprehensive usage guide with examples
- **[OO_API_GUIDE.md](OO_API_GUIDE.md)** - Object-oriented API documentation
- **[MULTI_CATEGORY_GUIDE.md](MULTI_CATEGORY_GUIDE.md)** - Multi-category classification guide
- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and guidelines
- **[TEST_SUMMARY.md](TEST_SUMMARY.md)** - Detailed test documentation
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and changes
## License
MIT License - see LICENSE file for details.
## Acknowledgments
Based on the original MailProbe by Burton Computer Corporation, inspired by Paul Graham's "A Plan for Spam" article.
Raw data
{
"_id": null,
"home_page": "https://github.com/huntsberg/mailprobe-py",
"name": "mailprobe-py",
"maintainer": "Peter Bowen",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": "peter-mailprobe@bowenfamily.org",
"keywords": "email, filter, bayesian, classification, spam, multi-category",
"author": "Peter Bowen",
"author_email": "peter-mailprobe@bowenfamily.org",
"download_url": "https://files.pythonhosted.org/packages/ca/5f/09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9/mailprobe_py-0.1.0.tar.gz",
"platform": null,
"description": "# MailProbe-Py\n\n[](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml)\n[](https://badge.fury.io/py/mailprobe-py)\n[](https://pypi.org/project/mailprobe-py/)\n[](https://opensource.org/licenses/MIT)\n\nA Python implementation of the MailProbe Bayesian email classifier, inspired by the original C++ MailProbe by Burton Computer Corporation.\n\n## Overview\n\nMailProbe-Py is a statistical email classifier that uses Bayesian analysis to identify spam emails. It learns from examples of spam and legitimate emails to build a database of word frequencies, then uses this information to score new emails.\n\n## Features\n\n- **Bayesian Analysis**: Uses statistical analysis of word frequencies to identify spam\n- **Learning Capability**: Trains on your specific email patterns for personalized filtering\n- **Multiple Email Formats**: Supports mbox, maildir, and individual email files\n- **MIME Support**: Handles MIME attachments and encoding (quoted-printable, base64)\n- **Phrase Analysis**: Analyzes both individual words and multi-word phrases\n- **Header Analysis**: Separately analyzes different email headers for improved accuracy\n- **Database Management**: Built-in database cleanup and maintenance commands\n- **Command Line Interface**: Comprehensive CLI matching original MailProbe functionality\n\n## Installation\n\n### Using Poetry (Recommended)\n\n```bash\ngit clone https://github.com/huntsberg/mailprobe-py\ncd mailprobe-py\npoetry install\n```\n\n### Using pip\n\n```bash\npip install mailprobe-py\n```\n\nFor development:\n\n```bash\ngit clone https://github.com/huntsberg/mailprobe-py\ncd mailprobe-py\npoetry install --with dev\n```\n\n## Quick Start\n\n### Command Line Usage\n\n1. **Create a database**:\n ```bash\n poetry run mailprobe-py create-db\n ```\n\n2. **Train on existing emails**:\n ```bash\n poetry run mailprobe-py good ~/mail/inbox\n poetry run mailprobe-py spam ~/mail/spam\n ```\n\n3. **Score new emails**:\n ```bash\n poetry run mailprobe-py score < new_email.txt\n ```\n\n### Python API Usage (Recommended for Integration)\n\n```python\nfrom mailprobe import MailProbeAPI\n\n# Create email classifier\nwith MailProbeAPI() as spam_filter:\n # Train on messages\n spam_filter.train_good([\"From: friend@example.com\\nSubject: Meeting\\n\\nLet's meet tomorrow.\"])\n spam_filter.train_spam([\"From: spammer@bad.com\\nSubject: FREE MONEY\\n\\nClick here!\"])\n \n # Classify new messages\n email_content = \"From: unknown@example.com\\nSubject: Question\\n\\nI have a question.\"\n is_spam = spam_filter.classify_text(email_content)\n print(f\"Is spam: {is_spam}\")\n \n # Get detailed results\n result = spam_filter.classify_text(email_content, return_details=True)\n print(f\"Probability: {result.probability:.3f}\")\n```\n\n### Convenience Functions\n\n```python\nfrom mailprobe import classify_email, get_spam_probability\n\n# Quick classification\nis_spam = classify_email(\"From: test@example.com\\nSubject: Test\\n\\nHello\")\nprobability = get_spam_probability(\"From: test@example.com\\nSubject: Test\\n\\nHello\")\n```\n\n### Multi-Category Classification\n\nMailProbe-Py supports classification into multiple categories beyond just spam/not-spam:\n\n```python\nfrom mailprobe import MultiCategoryFilter, FolderBasedClassifier\n\n# Multi-category classification\ncategories = ['work', 'personal', 'newsletters', 'spam']\nwith MultiCategoryFilter(categories) as classifier:\n # Train on different categories\n classifier.train_category('work', work_emails)\n classifier.train_category('personal', personal_emails)\n \n # Classify into categories\n result = classifier.classify(email_content)\n print(f\"Category: {result.category}, Confidence: {result.confidence:.3f}\")\n\n# Folder-based classification (auto-discovers categories from folders)\nwith FolderBasedClassifier('emails/') as classifier:\n classifier.train_from_folders() # Train from folder structure\n result = classifier.classify(email_content)\n print(f\"Should go to folder: {result.category}\")\n```\n\n## Commands\n\n- `create-db` - Create a new spam database\n- `train` - Score and selectively train on messages\n- `receive` - Score and always train on messages \n- `score` - Score messages without updating database\n- `good` - Mark messages as non-spam\n- `spam` - Mark messages as spam\n- `remove` - Remove messages from database\n- `dump` - Export database contents\n- `cleanup` - Remove old/rare terms from database\n- `help` - Show detailed help for commands\n\n## Configuration\n\nMailProbe-Py stores its database in `~/.mailprobe-py/` by default. You can change this with the `-d` option:\n\n```bash\npoetry run mailprobe-py -d /path/to/database score < email.txt\n```\n\n## Integration with Mail Systems\n\n### Procmail Example\n\nAdd to your `.procmailrc`:\n\n```\n:0 fw\n| poetry run mailprobe-py receive\n\n:0:\n* ^X-MailProbe: SPAM\nspam/\n```\n\n### Postfix Example\n\nConfigure as a content filter in Postfix to automatically score incoming mail.\n\n## Accuracy\n\nLike the original MailProbe, this implementation typically achieves:\n- 99%+ spam detection rate\n- Very low false positive rate (< 0.1%)\n- Improves with training on your specific email patterns\n\n## Testing\n\nMailProbe-Py includes a comprehensive test suite with **97 tests** and **81% code coverage**.\n\n### Running Tests\n\n```bash\n# Run all tests\npoetry run pytest\n\n# Run tests with coverage\npoetry run pytest --cov=src/mailprobe --cov-report=html\n\n# Run specific test categories\npoetry run pytest tests/test_api.py # API tests\npoetry run pytest tests/test_cli.py # CLI tests\npoetry run pytest tests/test_filter.py # Core filtering tests\n\n# Use the convenient test runner\npython run_tests.py quick # Quick test run\npython run_tests.py coverage # With coverage report\npython run_tests.py html # Generate HTML report\n```\n\n### Test Coverage\n\n- **API Tests**: High-level object-oriented interface\n- **CLI Tests**: Command-line interface functionality \n- **Core Tests**: Spam filtering, database, tokenization\n- **Integration Tests**: End-to-end workflows\n- **Error Handling**: Edge cases and invalid inputs\n\nAll tests pass successfully and the library is ready for production use.\n\n## Documentation\n\n- **[USAGE.md](USAGE.md)** - Comprehensive usage guide with examples\n- **[OO_API_GUIDE.md](OO_API_GUIDE.md)** - Object-oriented API documentation\n- **[MULTI_CATEGORY_GUIDE.md](MULTI_CATEGORY_GUIDE.md)** - Multi-category classification guide\n- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and guidelines\n- **[TEST_SUMMARY.md](TEST_SUMMARY.md)** - Detailed test documentation\n- **[CHANGELOG.md](CHANGELOG.md)** - Version history and changes\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Acknowledgments\n\nBased on the original MailProbe by Burton Computer Corporation, inspired by Paul Graham's \"A Plan for Spam\" article.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python implementation of advanced Bayesian email classification and filtering",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/huntsberg/mailprobe-py/issues",
"Changelog": "https://github.com/huntsberg/mailprobe-py/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/huntsberg/mailprobe-py/blob/main/README.md",
"Homepage": "https://github.com/huntsberg/mailprobe-py",
"Repository": "https://github.com/huntsberg/mailprobe-py"
},
"split_keywords": [
"email",
" filter",
" bayesian",
" classification",
" spam",
" multi-category"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cdc76ee6738d85e349ee06390728ffa9812724f3dbbf13c3e3bd4222c66641d6",
"md5": "4976edd6d37fa7cb343ecb8a545c45e8",
"sha256": "2ad69e61fedaaef02ddd5af9f944dba94cc633b1df8ab7b0506442803816b2c8"
},
"downloads": -1,
"filename": "mailprobe_py-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4976edd6d37fa7cb343ecb8a545c45e8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 39106,
"upload_time": "2025-09-01T03:47:59",
"upload_time_iso_8601": "2025-09-01T03:47:59.342399Z",
"url": "https://files.pythonhosted.org/packages/cd/c7/6ee6738d85e349ee06390728ffa9812724f3dbbf13c3e3bd4222c66641d6/mailprobe_py-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ca5f09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9",
"md5": "b1795f14d0f82af9f92c4d0a2a31a667",
"sha256": "d5da11d9b4b9e0a7b2153277ddbd6c8d87569b05b5e702a821ef04cb190ede56"
},
"downloads": -1,
"filename": "mailprobe_py-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b1795f14d0f82af9f92c4d0a2a31a667",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 47705,
"upload_time": "2025-09-01T03:48:00",
"upload_time_iso_8601": "2025-09-01T03:48:00.800310Z",
"url": "https://files.pythonhosted.org/packages/ca/5f/09b1d01f42d82cd333b57564f5550e8295ed87ddf269dbeca2034da914f9/mailprobe_py-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 03:48:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "huntsberg",
"github_project": "mailprobe-py",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mailprobe-py"
}