email-typo-fixer


Nameemail-typo-fixer JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/machado000/email-typo-fixer
SummaryA Python library to automatically detect and fix common typos in email addresses
upload_time2025-08-13 11:24:09
maintainerNone
docs_urlNone
authorJoao Brito
requires_python<4.0,>=3.10
licenseMIT
keywords email typo correction validation normalization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Email Typo Fixer

[![Python Support](https://img.shields.io/pypi/pyversions/email-typo-fixer.svg)](https://pypi.org/project/email-typo-fixer/)
[![PyPI version](https://img.shields.io/pypi/v/email-typo-fixer)](https://pypi.org/project/email-typo-fixer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Issues](https://img.shields.io/github/issues/machado000/email-typo-fixer)](https://github.com/machado000/email-typo-fixer/issues)

A Python library to automatically detect and fix common typos in email addresses using intelligent algorithms and domain knowledge.

## Features

- **Email Normalization**: Lowercases, strips, and removes invalid characters
- **Extension Validation**: Validates and corrects TLDs using the official [PublicSuffixList](https://pypi.org/project/publicsuffixlist/) (parses `.dat` file directly)
- **Smart Typo Detection**: Uses Levenshtein distance to detect and correct TLD and domain name typos
- **Domain Correction**: Fixes common domain typos (e.g., `gamil.com` → `gmail.com`)
- **Configurable**: Custom typo dictionary and distance thresholds
- **Logging Support**: Built-in logging for debugging and monitoring


## Installation

```bash
pip install email-typo-fixer
```

## Quick Start

```python
from email_typo_fixer import normalize_email, EmailTypoFixer

# Simple function interface
corrected_email = normalize_email("user@gamil.com")
print(corrected_email)  # user@gmail.com

# Class interface for more control
fixer = EmailTypoFixer(max_distance=1)
corrected_email = fixer.normalize("user@yaho.com")
print(corrected_email)  # user@yahoo.com
```


## Limitations

### TLD '.co' False Positives

By default, the library may correct emails ending in `.co` (such as `user@example.co`) to `.com` if the Levenshtein distance is within the allowed threshold. This can lead to false positives, especially for valid `.co` domains (e.g., Colombian domains or legitimate `.co` TLDs).

**How to control this behavior:**

- The `normalize` method and the `normalize_email` function accept an optional parameter `fix_tld_co: bool` (default: `True`).
- If you want to prevent `.co` domains from being auto-corrected to `.com`, call:

```python
from email_typo_fixer import normalize_email

normalize_email("user@example.co", fix_tld_co=False)  # Will NOT change .co to .com
```

Or, with the class:

```python
fixer = EmailTypoFixer()
fixer.normalize("user@example.co", fix_tld_co=False)
```

This gives you control to avoid unwanted corrections for `.co` domains.


## Usage Examples

### Basic Email Correction

```python
from email_typo_fixer import normalize_email

# Fix common domain typos
normalize_email("john.doe@gamil.com")     # → john.doe@gmail.com
normalize_email("jane@yaho.com")         # → jane@yahoo.com
normalize_email("user@outlok.com")       # → user@outlook.com
normalize_email("test@hotmal.com")       # → test@hotmail.com

# Fix extension typos (using up-to-date public suffix list)
normalize_email("user@example.co")       # → user@example.com
normalize_email("user@site.rog")         # → user@site.org
```

### Robust Suffix Handling

This library parses the official `public_suffix_list.dat` file at runtime, ensuring all TLDs and public suffixes are always up to date. No hardcoded suffixes are used.

### Advanced Usage with Custom Configuration

```python
from email_typo_fixer import EmailTypoFixer
import logging

# Create a custom logger
logger = logging.getLogger("email_fixer")
logger.setLevel(logging.INFO)

# Custom typo dictionary
custom_typos = {
    'companytypo': 'company',
    'orgtypo': 'org',
}

# Initialize with custom settings
fixer = EmailTypoFixer(
    max_distance=2,           # Allow more distant corrections
    typo_domains=custom_typos, # Use custom typo dictionary
    logger=logger             # Use custom logger
)

# Fix emails with custom rules
corrected = fixer.normalize("user@companytypo.com")
print(corrected)  # user@company.com
```

### Email Validation and Normalization

```python
from email_typo_fixer import EmailTypoFixer

fixer = EmailTypoFixer()

try:
    # Normalize and validate
    email = fixer.normalize("  USER@EXAMPLE.COM  ")
    print(email)  # user@example.com
    
    # Remove invalid characters
    email = fixer.normalize("us*er@exam!ple.com")
    print(email)  # user@example.com
    
except ValueError as e:
    print(f"Invalid email: {e}")
```

## API Reference

### `normalize_email(email: str) -> str`

Simple function interface for email normalization.

**Parameters:**
- `email` (str): The email address to normalize

**Returns:**
- `str`: The corrected and normalized email address

**Raises:**
- `ValueError`: If the email cannot be fixed or is invalid

### `EmailTypoFixer`

Main class for email typo correction with customizable options.

#### `__init__(max_distance=1, typo_domains=None, logger=None)`

**Parameters:**
- `max_distance` (int): Maximum Levenshtein distance for extension corrections (default: 1)
- `typo_domains` (dict): Custom dictionary of domain typos to corrections
- `logger` (logging.Logger): Custom logger instance

#### `normalize(email: str) -> str`

Normalize and fix typos in an email address.

**Parameters:**
- `email` (str): The email address to normalize

**Returns:**
- `str`: The corrected and normalized email address

**Raises:**
- `ValueError`: If the email cannot be fixed or is invalid

## Default Typo Corrections

The library includes built-in corrections for common email provider typos:

| Typo | Correction |
|------|------------|
| gamil | gmail |
| gmial | gmail |
| gnail | gmail |
| gmaill | gmail |
| yaho | yahoo |
| yahho | yahoo |
| outlok | outlook |
| outllok | outlook |
| outlokk | outlook |
| hotmal | hotmail |
| hotmial | hotmail |
| homtail | hotmail |
| hotmaill | hotmail |

## Error Handling

The library raises `ValueError` exceptions for emails that cannot be corrected:

```python
from email_typo_fixer import normalize_email

try:
    normalize_email("invalid.email")  # Missing @ symbol
except ValueError as e:
    print(f"Cannot fix email: {e}")

try:
    normalize_email("user@")  # Missing domain
except ValueError as e:
    print(f"Cannot fix email: {e}")
```

## Requirements

- Python 3.10+
- RapidFuzz  >= 3.13.0
- publicsuffixlist >= 1.0.2

## Development

### Setting up for Development

```bash
# Clone the repository
git clone https://github.com/yourusername/email-typo-fixer.git
cd email-typo-fixer

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell
```

### Running Tests

```bash
# Run tests with coverage
poetry run pytest

# Run tests with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_email_typo_fixer.py
```

### Code Quality

```bash
# Lint with flake8
poetry run flake8 email_typo_fixer tests

# Type checking with mypy
poetry run mypy email_typo_fixer
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.


## Acknowledgments

- Uses the [Levenshtein](https://github.com/maxbachmann/Levenshtein) and [RapidFuzz](https://github.com/rapidfuzz/RapidFuzz) libraries for string distance calculations
- Uses [publicsuffixlist](https://github.com/ko-zu/psl) for TLD (Top Level Domain) validation
- Inspired by various email validation libraries in the Python ecosystem

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/machado000/email-typo-fixer",
    "name": "email-typo-fixer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "email, typo, correction, validation, normalization",
    "author": "Joao Brito",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7b/e6/42863b043ace8adef05c3ef9890395af1527a8e2a4432e837c5ae31d57ad/email_typo_fixer-1.1.0.tar.gz",
    "platform": null,
    "description": "# Email Typo Fixer\n\n[![Python Support](https://img.shields.io/pypi/pyversions/email-typo-fixer.svg)](https://pypi.org/project/email-typo-fixer/)\n[![PyPI version](https://img.shields.io/pypi/v/email-typo-fixer)](https://pypi.org/project/email-typo-fixer/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Issues](https://img.shields.io/github/issues/machado000/email-typo-fixer)](https://github.com/machado000/email-typo-fixer/issues)\n\nA Python library to automatically detect and fix common typos in email addresses using intelligent algorithms and domain knowledge.\n\n## Features\n\n- **Email Normalization**: Lowercases, strips, and removes invalid characters\n- **Extension Validation**: Validates and corrects TLDs using the official [PublicSuffixList](https://pypi.org/project/publicsuffixlist/) (parses `.dat` file directly)\n- **Smart Typo Detection**: Uses Levenshtein distance to detect and correct TLD and domain name typos\n- **Domain Correction**: Fixes common domain typos (e.g., `gamil.com` \u2192 `gmail.com`)\n- **Configurable**: Custom typo dictionary and distance thresholds\n- **Logging Support**: Built-in logging for debugging and monitoring\n\n\n## Installation\n\n```bash\npip install email-typo-fixer\n```\n\n## Quick Start\n\n```python\nfrom email_typo_fixer import normalize_email, EmailTypoFixer\n\n# Simple function interface\ncorrected_email = normalize_email(\"user@gamil.com\")\nprint(corrected_email)  # user@gmail.com\n\n# Class interface for more control\nfixer = EmailTypoFixer(max_distance=1)\ncorrected_email = fixer.normalize(\"user@yaho.com\")\nprint(corrected_email)  # user@yahoo.com\n```\n\n\n## Limitations\n\n### TLD '.co' False Positives\n\nBy default, the library may correct emails ending in `.co` (such as `user@example.co`) to `.com` if the Levenshtein distance is within the allowed threshold. This can lead to false positives, especially for valid `.co` domains (e.g., Colombian domains or legitimate `.co` TLDs).\n\n**How to control this behavior:**\n\n- The `normalize` method and the `normalize_email` function accept an optional parameter `fix_tld_co: bool` (default: `True`).\n- If you want to prevent `.co` domains from being auto-corrected to `.com`, call:\n\n```python\nfrom email_typo_fixer import normalize_email\n\nnormalize_email(\"user@example.co\", fix_tld_co=False)  # Will NOT change .co to .com\n```\n\nOr, with the class:\n\n```python\nfixer = EmailTypoFixer()\nfixer.normalize(\"user@example.co\", fix_tld_co=False)\n```\n\nThis gives you control to avoid unwanted corrections for `.co` domains.\n\n\n## Usage Examples\n\n### Basic Email Correction\n\n```python\nfrom email_typo_fixer import normalize_email\n\n# Fix common domain typos\nnormalize_email(\"john.doe@gamil.com\")     # \u2192 john.doe@gmail.com\nnormalize_email(\"jane@yaho.com\")         # \u2192 jane@yahoo.com\nnormalize_email(\"user@outlok.com\")       # \u2192 user@outlook.com\nnormalize_email(\"test@hotmal.com\")       # \u2192 test@hotmail.com\n\n# Fix extension typos (using up-to-date public suffix list)\nnormalize_email(\"user@example.co\")       # \u2192 user@example.com\nnormalize_email(\"user@site.rog\")         # \u2192 user@site.org\n```\n\n### Robust Suffix Handling\n\nThis library parses the official `public_suffix_list.dat` file at runtime, ensuring all TLDs and public suffixes are always up to date. No hardcoded suffixes are used.\n\n### Advanced Usage with Custom Configuration\n\n```python\nfrom email_typo_fixer import EmailTypoFixer\nimport logging\n\n# Create a custom logger\nlogger = logging.getLogger(\"email_fixer\")\nlogger.setLevel(logging.INFO)\n\n# Custom typo dictionary\ncustom_typos = {\n    'companytypo': 'company',\n    'orgtypo': 'org',\n}\n\n# Initialize with custom settings\nfixer = EmailTypoFixer(\n    max_distance=2,           # Allow more distant corrections\n    typo_domains=custom_typos, # Use custom typo dictionary\n    logger=logger             # Use custom logger\n)\n\n# Fix emails with custom rules\ncorrected = fixer.normalize(\"user@companytypo.com\")\nprint(corrected)  # user@company.com\n```\n\n### Email Validation and Normalization\n\n```python\nfrom email_typo_fixer import EmailTypoFixer\n\nfixer = EmailTypoFixer()\n\ntry:\n    # Normalize and validate\n    email = fixer.normalize(\"  USER@EXAMPLE.COM  \")\n    print(email)  # user@example.com\n    \n    # Remove invalid characters\n    email = fixer.normalize(\"us*er@exam!ple.com\")\n    print(email)  # user@example.com\n    \nexcept ValueError as e:\n    print(f\"Invalid email: {e}\")\n```\n\n## API Reference\n\n### `normalize_email(email: str) -> str`\n\nSimple function interface for email normalization.\n\n**Parameters:**\n- `email` (str): The email address to normalize\n\n**Returns:**\n- `str`: The corrected and normalized email address\n\n**Raises:**\n- `ValueError`: If the email cannot be fixed or is invalid\n\n### `EmailTypoFixer`\n\nMain class for email typo correction with customizable options.\n\n#### `__init__(max_distance=1, typo_domains=None, logger=None)`\n\n**Parameters:**\n- `max_distance` (int): Maximum Levenshtein distance for extension corrections (default: 1)\n- `typo_domains` (dict): Custom dictionary of domain typos to corrections\n- `logger` (logging.Logger): Custom logger instance\n\n#### `normalize(email: str) -> str`\n\nNormalize and fix typos in an email address.\n\n**Parameters:**\n- `email` (str): The email address to normalize\n\n**Returns:**\n- `str`: The corrected and normalized email address\n\n**Raises:**\n- `ValueError`: If the email cannot be fixed or is invalid\n\n## Default Typo Corrections\n\nThe library includes built-in corrections for common email provider typos:\n\n| Typo | Correction |\n|------|------------|\n| gamil | gmail |\n| gmial | gmail |\n| gnail | gmail |\n| gmaill | gmail |\n| yaho | yahoo |\n| yahho | yahoo |\n| outlok | outlook |\n| outllok | outlook |\n| outlokk | outlook |\n| hotmal | hotmail |\n| hotmial | hotmail |\n| homtail | hotmail |\n| hotmaill | hotmail |\n\n## Error Handling\n\nThe library raises `ValueError` exceptions for emails that cannot be corrected:\n\n```python\nfrom email_typo_fixer import normalize_email\n\ntry:\n    normalize_email(\"invalid.email\")  # Missing @ symbol\nexcept ValueError as e:\n    print(f\"Cannot fix email: {e}\")\n\ntry:\n    normalize_email(\"user@\")  # Missing domain\nexcept ValueError as e:\n    print(f\"Cannot fix email: {e}\")\n```\n\n## Requirements\n\n- Python 3.10+\n- RapidFuzz  >= 3.13.0\n- publicsuffixlist >= 1.0.2\n\n## Development\n\n### Setting up for Development\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/email-typo-fixer.git\ncd email-typo-fixer\n\n# Install Poetry (if not already installed)\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# Install dependencies\npoetry install\n\n# Activate the virtual environment\npoetry shell\n```\n\n### Running Tests\n\n```bash\n# Run tests with coverage\npoetry run pytest\n\n# Run tests with verbose output\npoetry run pytest -v\n\n# Run specific test file\npoetry run pytest tests/test_email_typo_fixer.py\n```\n\n### Code Quality\n\n```bash\n# Lint with flake8\npoetry run flake8 email_typo_fixer tests\n\n# Type checking with mypy\npoetry run mypy email_typo_fixer\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n## Acknowledgments\n\n- Uses the [Levenshtein](https://github.com/maxbachmann/Levenshtein) and [RapidFuzz](https://github.com/rapidfuzz/RapidFuzz) libraries for string distance calculations\n- Uses [publicsuffixlist](https://github.com/ko-zu/psl) for TLD (Top Level Domain) validation\n- Inspired by various email validation libraries in the Python ecosystem\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library to automatically detect and fix common typos in email addresses",
    "version": "1.1.0",
    "project_urls": {
        "Homepage": "https://github.com/machado000/email-typo-fixer",
        "Issues": "https://github.com/machado000/email-typo-fixer/issues"
    },
    "split_keywords": [
        "email",
        " typo",
        " correction",
        " validation",
        " normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3ad6a19f43c7ddc780e70994bb68df97acc86dcbb30cbca2b43a62a03c326327",
                "md5": "ecdd2413ca07c0edba2ea31a4d614105",
                "sha256": "40a4d8677161b9554360236377e5097043f1f6fcebe5c863eb6ae8541b87abda"
            },
            "downloads": -1,
            "filename": "email_typo_fixer-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ecdd2413ca07c0edba2ea31a4d614105",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 8899,
            "upload_time": "2025-08-13T11:24:08",
            "upload_time_iso_8601": "2025-08-13T11:24:08.712336Z",
            "url": "https://files.pythonhosted.org/packages/3a/d6/a19f43c7ddc780e70994bb68df97acc86dcbb30cbca2b43a62a03c326327/email_typo_fixer-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7be642863b043ace8adef05c3ef9890395af1527a8e2a4432e837c5ae31d57ad",
                "md5": "9b36aa69cc58ce58ab93e918496e2e63",
                "sha256": "9018213050d9685effb8be45bcb345836b33e4a9971e539a9db27efc835b438d"
            },
            "downloads": -1,
            "filename": "email_typo_fixer-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9b36aa69cc58ce58ab93e918496e2e63",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 8112,
            "upload_time": "2025-08-13T11:24:09",
            "upload_time_iso_8601": "2025-08-13T11:24:09.772249Z",
            "url": "https://files.pythonhosted.org/packages/7b/e6/42863b043ace8adef05c3ef9890395af1527a8e2a4432e837c5ae31d57ad/email_typo_fixer-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 11:24:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "machado000",
    "github_project": "email-typo-fixer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "email-typo-fixer"
}
        
Elapsed time: 0.43744s