# 📧 emailtoolkit
[](https://pypi.org/project/emailtoolkit/)
[](https://pypi.org/project/emailtoolkit/)
[](https://github.com/ImYourBoyRoy/emailtoolkit/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](https://peps.python.org/pep-0561/)
> RFC‑aware email parsing, normalization, extraction, and DNS health checks with a clean, **phonenumbers‑style** API.
---
## ✨ Design goals
* **Simple API**
Be as easy as phonenumbers. Import module‑level functions for quick tasks, or instantiate `EmailTools` for tuned, high‑performance use.
* **Practical validation**
Separate syntax validation (via `email_validator`) from deliverability checks. Enforce your own DNS policy (require MX, or allow A/AAAA fallback).
* **Provider‑aware identity**
Correctly determine that `test.user@gmail.com` and `testuser+sales@googlemail.com` are the same identity using canonicalization rules.
* **Operations‑ready**
Native env, `.env`, and `config.json` support; PII‑safe logging; TTL‑cached DNS; robust CLI.
---
## ⭐ Features
* **Automatic Cloudflare Decoding**: Transparently finds and decodes Cloudflare-protected email addresses from HTML.
* **Robust Extraction**: Discovers emails from free text, `mailto:` links, and other common formats.
* **Canonical Identity**: Intelligently compares emails, understanding that `test.user@gmail.com` is the same as `testuser+sales@googlemail.com`.
* **DNS Health Checks**: Validates domain deliverability by checking for MX and A/AAAA records.
* **Disposable Domain Filtering**: Flags or blocks emails from known disposable providers.
* **Configurable**: Fine-tune behavior with environment variables, `.env` files, or a `config.json`.
---
## 🚀 Installation
```bash
pip install emailtoolkit
# extras for DNS and .env support
pip install "emailtoolkit[dns,dotenv]"
```
---
## 🧪 Quick start
```python
import emailtoolkit as et
# Validate
et.is_valid("Test.User+sales@Gmail.com") # True
# Canonical form (provider‑specific rules)
et.canonical("t.e.s.t+sales@googlemail.com") # "test@gmail.com"
# Compare by canonical identity
et.compare("t.e.s.t+sales@googlemail.com", "test@gmail.com") # True
# Extract from free text (returns Email objects)
found = et.extract("Contact a@example.com, A@EXAMPLE.com, and junk@@bad.")
print([e.normalized for e in found]) # ["a@example.com", "A@example.com"]
```
---
## 🛠️ Command‑line interface (CLI)
```bash
# Canonical form
emailtoolkit canonical "t.e.s.t+bar@googlemail.com"
# → test@gmail.com
# Domain DNS health (JSON)
emailtoolkit domain example.com
# {
# "domain": "example.com",
# "ascii_domain": "example.com",
# "mx_hosts": [],
# "a_hosts": ["93.184.216.34"],
# "has_mx": false,
# "has_a": true,
# "disposable": false
# }
# Extract from stdin
echo "Contact me at a@example.com" | emailtoolkit extract
```
---
## ⚙️ Configuration
Load precedence:
1. Environment variables (e.g., `EMAILTK_LOG_LEVEL`)
2. `.env` in the working directory (requires `dotenv` extra)
3. `config.json` (when passed to CLI `--config` or `build_tools("/path/to/config.json")`)
4. Internal defaults
### Environment variables (full)
| Variable | Type | Default | Description |
| ------------------------------------- | ------------------------------------- | ------------- | --------------------------------------------------------------------- |
| `EMAILTK_LOG_LEVEL` | str | `INFO` | Logging level: `DEBUG` `INFO` `WARNING` `ERROR` |
| `EMAILTK_REQUIRE_MX` | bool | `true` | If true, deliverability requires MX. If false, MX or A/AAAA is enough |
| `EMAILTK_REQUIRE_DELIVERABILITY` | bool | `false` | If true, `parse` raises if deliverability fails |
| `EMAILTK_ALLOW_SMTPUTF8` | bool | `true` | Allow UTF‑8 local parts per RFC 6531 |
| `EMAILTK_DNS_TIMEOUT_SECONDS` | float | `2.0` | DNS timeout seconds |
| `EMAILTK_DNS_TTL_SECONDS` | int | `900` | TTL for cached DNS answers |
| `EMAILTK_USE_DNSPYTHON` | bool | `true` | Use dnspython when available |
| `EMAILTK_EXTRACT_UNIQUE` | bool | `true` | Deduplicate by canonical form during extraction |
| `EMAILTK_EXTRACT_MAX_RESULTS` | int or empty | empty | Hard cap on extractor results. Empty or 0 means no cap |
| `EMAILTK_NORMALIZE_CASE` | bool | `true` | Lowercase domain on normalize |
| `EMAILTK_GMAIL_CANON` | bool | `true` | Apply Gmail dot and plus canonicalization rules |
| `EMAILTK_TREAT_DISPOSABLE_AS_INVALID` | bool | `false` | If true, disposable domains cause `parse` to raise |
| `EMAILTK_BLOCK_PRIVATE_TLDS` | bool | `false` | Enforce known public suffixes if provided |
| `EMAILTK_PUBLIC_SUFFIX_FILE` | path | empty | File with known public suffixes, one per line |
| `EMAILTK_DISPOSABLE_SOURCE` | `file://...` or `url://...` or `none` | `none` | Source for disposable domains |
| `EMAILTK_ENABLE_SMTP_PROBE` | bool | `false` | Reserved for optional SMTP probing module |
| `EMAILTK_SMTP_PROBE_TIMEOUT` | float | `3.0` | Probe timeout |
| `EMAILTK_SMTP_PROBE_CONCURRENCY` | int | `5` | Probe concurrency |
| `EMAILTK_SMTP_PROBE_HELO` | str | `example.com` | HELO/EHLO identity |
| `EMAILTK_PII_REDACT_LOGS` | bool | `true` | Mask emails in logs and exceptions |
| `EMAILTK_PII_REDACT_STYLE` | `mask` or `none` | `mask` | Redaction style |
See `.env.example` for a ready‑to‑copy template.
---
## 🧱 Disposable domain filtering
Create a text file and point to it:
```text
# disposable.txt
# Lines beginning with # are comments
# Domains are matched case‑insensitively on ASCII form
mailinator.com
10minutemail.com
sharklasers.com
```
Enable via `.env`:
```env
EMAILTK_DISPOSABLE_SOURCE=file://./disposable.txt
```
Optionally set:
```env
EMAILTK_TREAT_DISPOSABLE_AS_INVALID=true
```
This will raise `EmailParseException` when parsing addresses on those domains.
---
## 🤖 Agents, MCP servers, and tool‑calling
```python
from pydantic import BaseModel, Field
import emailtoolkit as et
class EmailInput(BaseModel):
email: str = Field(..., description="Email address to parse")
class DomainInput(BaseModel):
domain: str = Field(..., description="Domain to inspect")
def tool_parse(args: EmailInput):
e = et.parse(args.email)
return {
"normalized": e.normalized,
"canonical": e.canonical,
"deliverable": e.deliverable_dns,
"domain": e.domain_info.ascii_domain,
}
def tool_domain(args: DomainInput):
d = et.domain_health(args.domain)
return {
"domain": d.ascii_domain,
"has_mx": d.has_mx,
"has_a": d.has_a,
"disposable": d.disposable,
}
```
---
## 📚 API surface
```python
import emailtoolkit as et
from emailtoolkit import EmailTools, Email, DomainInfo, EmailParseException
# module functions
et.parse(raw: str) -> Email
et.is_valid(raw: str) -> bool
et.normalize(raw: str) -> str
et.canonical(raw: str) -> str
et.extract(text: str) -> list[Email]
et.compare(a: str, b: str) -> bool
et.domain_health(domain: str) -> DomainInfo
et.build_tools(overrides_path: str | None = None) -> EmailTools
# dataclasses
Email(
original, local, domain, ascii_email, normalized, canonical,
domain_info: DomainInfo, valid_syntax: bool, deliverable_dns: bool, reason: str|None
)
DomainInfo(domain, ascii_domain, mx_hosts, a_hosts, has_mx, has_a, disposable)
```
---
## 🔒 Security & privacy
* PII redaction in logs is on by default (`EMAILTK_PII_REDACT_LOGS`).
* Avoid logging raw addresses in your application.
* If SMTP probing is enabled in the future, keep it opt‑in, rate‑limited, and legally reviewed.
---
## 🧰 Development
```bash
pip install -e ".[dns,dotenv]" pytest ruff mypy
ruff check src
mypy src/emailtoolkit
pytest -q
```
---
## 🙏 Acknowledgments
Built on:
* **email\_validator** by Joshua Tauberer (Unlicense)
* **dnspython** (ISC) \[optional]
* **idna** (BSD‑3‑Clause)
See `THIRD_PARTY_NOTICES.md` for license texts.
---
## 📦 License
MIT. See [LICENSE](LICENSE). Third‑party licenses in [THIRD\_PARTY\_NOTICES.md](THIRD_PARTY_NOTICES.md).
---
## ⭐ Support
If this toolkit helps you, star the repo and share it. Issues and PRs welcome.
Raw data
{
"_id": null,
"home_page": null,
"name": "emailtoolkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "dns, email, email-validator, idna, normalizer, parser, rfc, validation",
"author": null,
"author_email": "ItsYourBoyRoy <roy.dawson.iv@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/06/5e/38955d7025498f624189588256290b1b10ac79dd372810f30bc9590af9eb/emailtoolkit-0.1.4.tar.gz",
"platform": null,
"description": "# \ud83d\udce7 emailtoolkit\n\n[](https://pypi.org/project/emailtoolkit/)\n[](https://pypi.org/project/emailtoolkit/)\n[](https://github.com/ImYourBoyRoy/emailtoolkit/actions/workflows/ci.yml)\n[](https://opensource.org/licenses/MIT)\n[](https://peps.python.org/pep-0561/)\n\n> RFC\u2011aware email parsing, normalization, extraction, and DNS health checks with a clean, **phonenumbers\u2011style** API.\n\n---\n\n## \u2728 Design goals\n\n* **Simple API**\n Be as easy as phonenumbers. Import module\u2011level functions for quick tasks, or instantiate `EmailTools` for tuned, high\u2011performance use.\n\n* **Practical validation**\n Separate syntax validation (via `email_validator`) from deliverability checks. Enforce your own DNS policy (require MX, or allow A/AAAA fallback).\n\n* **Provider\u2011aware identity**\n Correctly determine that `test.user@gmail.com` and `testuser+sales@googlemail.com` are the same identity using canonicalization rules.\n\n* **Operations\u2011ready**\n Native env, `.env`, and `config.json` support; PII\u2011safe logging; TTL\u2011cached DNS; robust CLI.\n\n---\n\n## \u2b50 Features\n\n* **Automatic Cloudflare Decoding**: Transparently finds and decodes Cloudflare-protected email addresses from HTML.\n* **Robust Extraction**: Discovers emails from free text, `mailto:` links, and other common formats.\n* **Canonical Identity**: Intelligently compares emails, understanding that `test.user@gmail.com` is the same as `testuser+sales@googlemail.com`.\n* **DNS Health Checks**: Validates domain deliverability by checking for MX and A/AAAA records.\n* **Disposable Domain Filtering**: Flags or blocks emails from known disposable providers.\n* **Configurable**: Fine-tune behavior with environment variables, `.env` files, or a `config.json`.\n\n---\n\n## \ud83d\ude80 Installation\n\n```bash\npip install emailtoolkit\n# extras for DNS and .env support\npip install \"emailtoolkit[dns,dotenv]\"\n```\n\n---\n\n## \ud83e\uddea Quick start\n\n```python\nimport emailtoolkit as et\n\n# Validate\net.is_valid(\"Test.User+sales@Gmail.com\") # True\n\n# Canonical form (provider\u2011specific rules)\net.canonical(\"t.e.s.t+sales@googlemail.com\") # \"test@gmail.com\"\n\n# Compare by canonical identity\net.compare(\"t.e.s.t+sales@googlemail.com\", \"test@gmail.com\") # True\n\n# Extract from free text (returns Email objects)\nfound = et.extract(\"Contact a@example.com, A@EXAMPLE.com, and junk@@bad.\")\nprint([e.normalized for e in found]) # [\"a@example.com\", \"A@example.com\"]\n```\n\n---\n\n## \ud83d\udee0\ufe0f Command\u2011line interface (CLI)\n\n```bash\n# Canonical form\nemailtoolkit canonical \"t.e.s.t+bar@googlemail.com\"\n# \u2192 test@gmail.com\n\n# Domain DNS health (JSON)\nemailtoolkit domain example.com\n# {\n# \"domain\": \"example.com\",\n# \"ascii_domain\": \"example.com\",\n# \"mx_hosts\": [],\n# \"a_hosts\": [\"93.184.216.34\"],\n# \"has_mx\": false,\n# \"has_a\": true,\n# \"disposable\": false\n# }\n\n# Extract from stdin\necho \"Contact me at a@example.com\" | emailtoolkit extract\n```\n\n---\n\n## \u2699\ufe0f Configuration\n\nLoad precedence:\n\n1. Environment variables (e.g., `EMAILTK_LOG_LEVEL`)\n2. `.env` in the working directory (requires `dotenv` extra)\n3. `config.json` (when passed to CLI `--config` or `build_tools(\"/path/to/config.json\")`)\n4. Internal defaults\n\n### Environment variables (full)\n\n| Variable | Type | Default | Description |\n| ------------------------------------- | ------------------------------------- | ------------- | --------------------------------------------------------------------- |\n| `EMAILTK_LOG_LEVEL` | str | `INFO` | Logging level: `DEBUG` `INFO` `WARNING` `ERROR` |\n| `EMAILTK_REQUIRE_MX` | bool | `true` | If true, deliverability requires MX. If false, MX or A/AAAA is enough |\n| `EMAILTK_REQUIRE_DELIVERABILITY` | bool | `false` | If true, `parse` raises if deliverability fails |\n| `EMAILTK_ALLOW_SMTPUTF8` | bool | `true` | Allow UTF\u20118 local parts per RFC 6531 |\n| `EMAILTK_DNS_TIMEOUT_SECONDS` | float | `2.0` | DNS timeout seconds |\n| `EMAILTK_DNS_TTL_SECONDS` | int | `900` | TTL for cached DNS answers |\n| `EMAILTK_USE_DNSPYTHON` | bool | `true` | Use dnspython when available |\n| `EMAILTK_EXTRACT_UNIQUE` | bool | `true` | Deduplicate by canonical form during extraction |\n| `EMAILTK_EXTRACT_MAX_RESULTS` | int or empty | empty | Hard cap on extractor results. Empty or 0 means no cap |\n| `EMAILTK_NORMALIZE_CASE` | bool | `true` | Lowercase domain on normalize |\n| `EMAILTK_GMAIL_CANON` | bool | `true` | Apply Gmail dot and plus canonicalization rules |\n| `EMAILTK_TREAT_DISPOSABLE_AS_INVALID` | bool | `false` | If true, disposable domains cause `parse` to raise |\n| `EMAILTK_BLOCK_PRIVATE_TLDS` | bool | `false` | Enforce known public suffixes if provided |\n| `EMAILTK_PUBLIC_SUFFIX_FILE` | path | empty | File with known public suffixes, one per line |\n| `EMAILTK_DISPOSABLE_SOURCE` | `file://...` or `url://...` or `none` | `none` | Source for disposable domains |\n| `EMAILTK_ENABLE_SMTP_PROBE` | bool | `false` | Reserved for optional SMTP probing module |\n| `EMAILTK_SMTP_PROBE_TIMEOUT` | float | `3.0` | Probe timeout |\n| `EMAILTK_SMTP_PROBE_CONCURRENCY` | int | `5` | Probe concurrency |\n| `EMAILTK_SMTP_PROBE_HELO` | str | `example.com` | HELO/EHLO identity |\n| `EMAILTK_PII_REDACT_LOGS` | bool | `true` | Mask emails in logs and exceptions |\n| `EMAILTK_PII_REDACT_STYLE` | `mask` or `none` | `mask` | Redaction style |\n\nSee `.env.example` for a ready\u2011to\u2011copy template.\n\n---\n\n## \ud83e\uddf1 Disposable domain filtering\n\nCreate a text file and point to it:\n\n```text\n# disposable.txt\n# Lines beginning with # are comments\n# Domains are matched case\u2011insensitively on ASCII form\nmailinator.com\n10minutemail.com\nsharklasers.com\n```\n\nEnable via `.env`:\n\n```env\nEMAILTK_DISPOSABLE_SOURCE=file://./disposable.txt\n```\n\nOptionally set:\n\n```env\nEMAILTK_TREAT_DISPOSABLE_AS_INVALID=true\n```\n\nThis will raise `EmailParseException` when parsing addresses on those domains.\n\n---\n\n## \ud83e\udd16 Agents, MCP servers, and tool\u2011calling\n\n```python\nfrom pydantic import BaseModel, Field\nimport emailtoolkit as et\n\nclass EmailInput(BaseModel):\n email: str = Field(..., description=\"Email address to parse\")\n\nclass DomainInput(BaseModel):\n domain: str = Field(..., description=\"Domain to inspect\")\n\ndef tool_parse(args: EmailInput):\n e = et.parse(args.email)\n return {\n \"normalized\": e.normalized,\n \"canonical\": e.canonical,\n \"deliverable\": e.deliverable_dns,\n \"domain\": e.domain_info.ascii_domain,\n }\n\ndef tool_domain(args: DomainInput):\n d = et.domain_health(args.domain)\n return {\n \"domain\": d.ascii_domain,\n \"has_mx\": d.has_mx,\n \"has_a\": d.has_a,\n \"disposable\": d.disposable,\n }\n```\n\n---\n\n## \ud83d\udcda API surface\n\n```python\nimport emailtoolkit as et\nfrom emailtoolkit import EmailTools, Email, DomainInfo, EmailParseException\n\n# module functions\net.parse(raw: str) -> Email\net.is_valid(raw: str) -> bool\net.normalize(raw: str) -> str\net.canonical(raw: str) -> str\net.extract(text: str) -> list[Email]\net.compare(a: str, b: str) -> bool\net.domain_health(domain: str) -> DomainInfo\net.build_tools(overrides_path: str | None = None) -> EmailTools\n\n# dataclasses\nEmail(\n original, local, domain, ascii_email, normalized, canonical,\n domain_info: DomainInfo, valid_syntax: bool, deliverable_dns: bool, reason: str|None\n)\nDomainInfo(domain, ascii_domain, mx_hosts, a_hosts, has_mx, has_a, disposable)\n```\n\n---\n\n## \ud83d\udd12 Security & privacy\n\n* PII redaction in logs is on by default (`EMAILTK_PII_REDACT_LOGS`).\n* Avoid logging raw addresses in your application.\n* If SMTP probing is enabled in the future, keep it opt\u2011in, rate\u2011limited, and legally reviewed.\n\n---\n\n## \ud83e\uddf0 Development\n\n```bash\npip install -e \".[dns,dotenv]\" pytest ruff mypy\nruff check src\nmypy src/emailtoolkit\npytest -q\n```\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\nBuilt on:\n\n* **email\\_validator** by Joshua Tauberer (Unlicense)\n* **dnspython** (ISC) \\[optional]\n* **idna** (BSD\u20113\u2011Clause)\n\nSee `THIRD_PARTY_NOTICES.md` for license texts.\n\n---\n\n## \ud83d\udce6 License\n\nMIT. See [LICENSE](LICENSE). Third\u2011party licenses in [THIRD\\_PARTY\\_NOTICES.md](THIRD_PARTY_NOTICES.md).\n\n---\n\n## \u2b50 Support\n\nIf this toolkit helps you, star the repo and share it. Issues and PRs welcome.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "RFC-aware email parsing, normalization, extraction, and DNS health checks with env-config and a phonenumbers-like API.",
"version": "0.1.4",
"project_urls": {
"Bug Tracker": "https://github.com/imyourboyroy/emailtoolkit/issues",
"Homepage": "https://github.com/imyourboyroy/emailtoolkit",
"Repository": "https://github.com/imyourboyroy/emailtoolkit"
},
"split_keywords": [
"dns",
" email",
" email-validator",
" idna",
" normalizer",
" parser",
" rfc",
" validation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "dc06e6997b7c8b6f0b17eb4a05ac30e465fcaa9fbc16b927131c5041a76dc7e8",
"md5": "b7884ff5ec22d62366184da9fb0a79c0",
"sha256": "ed8d24a37cebf84843006d31fc99b4ba361f19f1cb33a2f9797be37e5c5d8048"
},
"downloads": -1,
"filename": "emailtoolkit-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b7884ff5ec22d62366184da9fb0a79c0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 17948,
"upload_time": "2025-08-13T23:39:42",
"upload_time_iso_8601": "2025-08-13T23:39:42.116707Z",
"url": "https://files.pythonhosted.org/packages/dc/06/e6997b7c8b6f0b17eb4a05ac30e465fcaa9fbc16b927131c5041a76dc7e8/emailtoolkit-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "065e38955d7025498f624189588256290b1b10ac79dd372810f30bc9590af9eb",
"md5": "4c94b13b6306a8921ad86c465b668654",
"sha256": "bd1666ac7a68fd684d08ba31c0ad2604cd25bc35438c064ea9c5c2eb88ff756d"
},
"downloads": -1,
"filename": "emailtoolkit-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "4c94b13b6306a8921ad86c465b668654",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 13354,
"upload_time": "2025-08-13T23:39:43",
"upload_time_iso_8601": "2025-08-13T23:39:43.624735Z",
"url": "https://files.pythonhosted.org/packages/06/5e/38955d7025498f624189588256290b1b10ac79dd372810f30bc9590af9eb/emailtoolkit-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-13 23:39:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "imyourboyroy",
"github_project": "emailtoolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "emailtoolkit"
}