contraction-fix


Namecontraction-fix JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryA fast and efficient library for fixing contractions in text with reverse functionality and batch processing support
upload_time2025-07-22 03:03:12
maintainerNone
docs_urlNone
authorSean Gao
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Contraction Fix

[![PyPI version](https://img.shields.io/pypi/v/contraction-fix.svg)](https://pypi.org/project/contraction-fix/)
[![Python Versions](https://img.shields.io/pypi/pyversions/contraction-fix.svg)](https://pypi.org/project/contraction-fix/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A fast and efficient library for fixing contractions in text. This package provides tools to expand contractions in English text while maintaining high performance and accuracy. **NEW in v0.2.1: Reverse functionality to contract expanded forms back to contractions!**

## Features

- Fast text processing using precompiled regex patterns
- **Batch processing for multiple texts with optimized performance**
- **NEW: Reverse functionality to contract expanded forms back to contractions**
- Support for standard contractions, informal contractions, and internet slang
- Configurable dictionary usage
- Optimized caching for improved performance
- Preview functionality to see contractions before fixing
- Easy addition and removal of custom contractions
- Thread-safe operations

## Installation

```bash
pip install contraction-fix
```

## Usage

### Basic Usage

#### Expanding Contractions

```python
from contraction_fix import fix

text = "I can't believe it's not butter!"
fixed_text = fix(text)
print(fixed_text)  # "I cannot believe it is not butter!"
```

#### Contracting Expanded Forms (NEW!)

```python
from contraction_fix import contract

text = "I cannot believe it is not butter!"
contracted_text = contract(text)
print(contracted_text)  # "I can't believe it's not butter!"
```

### Batch Processing

#### Expanding Contractions in Batch

For processing multiple texts efficiently:

```python
from contraction_fix import fix_batch

texts = [
    "I can't believe it's working!",
    "They're going to the store",
    "We'll see what happens"
]

fixed_texts = fix_batch(texts)
print(fixed_texts)
# Output: ["I cannot believe it is working!", "They are going to the store", "We will see what happens"]
```

#### Contracting Expanded Forms in Batch (NEW!)

```python
from contraction_fix import contract_batch

texts = [
    "I cannot believe it is working!",
    "They are going to the store", 
    "We will see what happens"
]

contracted_texts = contract_batch(texts)
print(contracted_texts)
# Output: ["I can't believe it's working!", "They're goin' to the store", "We'll see what happens"]
```

### Instantiating `ContractionFixer`

Start by creating an instance of the `ContractionFixer` class:

```python
from contraction_fix import ContractionFixer

fixer = ContractionFixer()
```

### Optional Parameters:

- **`use_informal: bool = True`**
    
    - Enables informal contractions like `"gonna"` → `"going to"`.
        
    - Set to `False` to avoid informal style expansions.
        
- **`use_slang: bool = True`**
    
    - Enables slang contractions like `"brb"` → `"be right back"`.
        
    - Set to `False` for more formal or academic applications.
        
- **`cache_size: int = 1024`**
    
    - Sets the LRU cache size for memoization. Improves performance when processing repeated inputs.
        

#### Example – Disabling slang:

```python
fixer = ContractionFixer(use_slang=False)
print(fixer.fix("brb, idk what's up"))  
# Output: "brb, I don't know what is up"  (brb is skipped because use_slang=False)
```

### Contractions vs. Possessives

The package intelligently differentiates between contractions and possessive forms:

```python
from contraction_fix import fix

text = "I can't find Sarah's keys, and she won't be at her brother's house until it's dark."
fixed_text = fix(text)
print(fixed_text)  # "I cannot find Sarah's keys, and she will not be at her brother's house until it is dark."
```

Notice how the package:
- Expands contractions: "can't" → "cannot", "won't" → "will not", "it's" → "it is"
- Preserves possessives: "Sarah's" and "brother's" remain unchanged

### Advanced Usage

```python
from contraction_fix import ContractionFixer

# Create a custom fixer instance
fixer = ContractionFixer(use_informal=True, use_slang=False)

# Fix single text
text = "I'd like to see y'all tomorrow"
fixed_text = fixer.fix(text)
print(fixed_text)  # "I would like to see you all tomorrow"

# Contract single text (NEW!)
expanded_text = "I would like to see you all tomorrow"
contracted_text = fixer.contract(expanded_text)
print(contracted_text)  # "I would like to see y'all tomorrow"

# Fix multiple texts efficiently
texts = [
    "I can't believe it's working",
    "They're going home",
    "We'll see what happens"
]
fixed_texts = fixer.fix_batch(texts)
print(fixed_texts)  # ["I cannot believe it is working", "They are going home", "We will see what happens"]

# Contract multiple texts efficiently (NEW!)
expanded_texts = [
    "I cannot believe it is working",
    "They are going home",
    "We will see what happens"
]
contracted_texts = fixer.contract_batch(expanded_texts)
print(contracted_texts)  # ["I can't believe it's working", "They're goin' home", "We'll see what happens"]

# Preview contractions
matches = fixer.preview(text, context_size=5)
for match in matches:
    print(f"Found '{match.text}' at position {match.start}")
    print(f"Context: '{match.context}'")
    print(f"Will be replaced with: '{match.replacement}'")

# Add custom contraction
fixer.add_contraction("gonna", "going to")

# Remove contraction
fixer.remove_contraction("won't")
```

## Dictionary Types

The package uses three types of dictionaries:

1. **Standard Contractions**: Common English contractions like "can't", "won't", etc.
2. **Informal Contractions**: Less formal contractions and patterns like "goin'", "doin'", etc.
3. **Internet Slang**: Modern internet slang and abbreviations like "lol", "btw", etc.

## Performance

The package is optimized for speed through:
- Precompiled regex patterns with cached compilation
- LRU caching of results for repeated inputs
- Efficient dictionary lookups with optimized key ordering
- **Batch processing for multiple texts**
- Minimal memory usage with frozenset constants
- Thread-safe operations

### Batch Processing Performance

When processing multiple texts, use `fix_batch()` or `contract_batch()` for better performance:

```python
from contraction_fix import fix_batch, contract_batch

# More efficient for multiple texts
texts = ["I can't go", "They're here", "We'll see"]
results = fix_batch(texts)  # Uses shared cache and optimized processing

# For reverse processing
expanded_texts = ["I cannot go", "They are here", "We will see"]
results = contract_batch(expanded_texts)  # Uses shared cache and optimized processing

# Less efficient for multiple texts
results = [fix(text) for text in texts]  # Creates new instances
```

## API Reference

### Functions

- `fix(text: str, use_informal: bool = True, use_slang: bool = True) -> str`
- `fix_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]`
- `contract(text: str, use_informal: bool = True, use_slang: bool = True) -> str` **(NEW!)**
- `contract_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]` **(NEW!)**

### Classes

- `ContractionFixer(use_informal: bool = True, use_slang: bool = True, cache_size: int = 1024)`
  - `fix(text: str) -> str`
  - `fix_batch(texts: List[str]) -> List[str]`
  - `contract(text: str) -> str` **(NEW!)**
  - `contract_batch(texts: List[str]) -> List[str]` **(NEW!)**
  - `preview(text: str, context_size: int = 10) -> List[Match]`
  - `add_contraction(contraction: str, expansion: str) -> None`
  - `remove_contraction(contraction: str) -> None`

## What's New in v0.2.1

- **Reverse Functionality**: New `contract()` and `contract_batch()` methods to convert expanded forms back to contractions
- **Enhanced API**: Package-level convenience functions for reverse functionality
- **Comprehensive Testing**: Extensive test coverage for all new functionality
- **Improved Performance**: Optimizations for both expansion and contraction operations

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details. 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "contraction-fix",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Sean Gao",
    "author_email": "seangaoxy@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ed/84/6b9ad752706510156f0cf7131a24eda5f74892a40b858b745c62e8c6e91b/contraction_fix-0.2.1.tar.gz",
    "platform": null,
    "description": "# Contraction Fix\n\n[![PyPI version](https://img.shields.io/pypi/v/contraction-fix.svg)](https://pypi.org/project/contraction-fix/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/contraction-fix.svg)](https://pypi.org/project/contraction-fix/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA fast and efficient library for fixing contractions in text. This package provides tools to expand contractions in English text while maintaining high performance and accuracy. **NEW in v0.2.1: Reverse functionality to contract expanded forms back to contractions!**\n\n## Features\n\n- Fast text processing using precompiled regex patterns\n- **Batch processing for multiple texts with optimized performance**\n- **NEW: Reverse functionality to contract expanded forms back to contractions**\n- Support for standard contractions, informal contractions, and internet slang\n- Configurable dictionary usage\n- Optimized caching for improved performance\n- Preview functionality to see contractions before fixing\n- Easy addition and removal of custom contractions\n- Thread-safe operations\n\n## Installation\n\n```bash\npip install contraction-fix\n```\n\n## Usage\n\n### Basic Usage\n\n#### Expanding Contractions\n\n```python\nfrom contraction_fix import fix\n\ntext = \"I can't believe it's not butter!\"\nfixed_text = fix(text)\nprint(fixed_text)  # \"I cannot believe it is not butter!\"\n```\n\n#### Contracting Expanded Forms (NEW!)\n\n```python\nfrom contraction_fix import contract\n\ntext = \"I cannot believe it is not butter!\"\ncontracted_text = contract(text)\nprint(contracted_text)  # \"I can't believe it's not butter!\"\n```\n\n### Batch Processing\n\n#### Expanding Contractions in Batch\n\nFor processing multiple texts efficiently:\n\n```python\nfrom contraction_fix import fix_batch\n\ntexts = [\n    \"I can't believe it's working!\",\n    \"They're going to the store\",\n    \"We'll see what happens\"\n]\n\nfixed_texts = fix_batch(texts)\nprint(fixed_texts)\n# Output: [\"I cannot believe it is working!\", \"They are going to the store\", \"We will see what happens\"]\n```\n\n#### Contracting Expanded Forms in Batch (NEW!)\n\n```python\nfrom contraction_fix import contract_batch\n\ntexts = [\n    \"I cannot believe it is working!\",\n    \"They are going to the store\", \n    \"We will see what happens\"\n]\n\ncontracted_texts = contract_batch(texts)\nprint(contracted_texts)\n# Output: [\"I can't believe it's working!\", \"They're goin' to the store\", \"We'll see what happens\"]\n```\n\n### Instantiating `ContractionFixer`\n\nStart by creating an instance of the `ContractionFixer` class:\n\n```python\nfrom contraction_fix import ContractionFixer\n\nfixer = ContractionFixer()\n```\n\n### Optional Parameters:\n\n- **`use_informal: bool = True`**\n    \n    - Enables informal contractions like `\"gonna\"` \u2192 `\"going to\"`.\n        \n    - Set to `False` to avoid informal style expansions.\n        \n- **`use_slang: bool = True`**\n    \n    - Enables slang contractions like `\"brb\"` \u2192 `\"be right back\"`.\n        \n    - Set to `False` for more formal or academic applications.\n        \n- **`cache_size: int = 1024`**\n    \n    - Sets the LRU cache size for memoization. Improves performance when processing repeated inputs.\n        \n\n#### Example \u2013 Disabling slang:\n\n```python\nfixer = ContractionFixer(use_slang=False)\nprint(fixer.fix(\"brb, idk what's up\"))  \n# Output: \"brb, I don't know what is up\"  (brb is skipped because use_slang=False)\n```\n\n### Contractions vs. Possessives\n\nThe package intelligently differentiates between contractions and possessive forms:\n\n```python\nfrom contraction_fix import fix\n\ntext = \"I can't find Sarah's keys, and she won't be at her brother's house until it's dark.\"\nfixed_text = fix(text)\nprint(fixed_text)  # \"I cannot find Sarah's keys, and she will not be at her brother's house until it is dark.\"\n```\n\nNotice how the package:\n- Expands contractions: \"can't\" \u2192 \"cannot\", \"won't\" \u2192 \"will not\", \"it's\" \u2192 \"it is\"\n- Preserves possessives: \"Sarah's\" and \"brother's\" remain unchanged\n\n### Advanced Usage\n\n```python\nfrom contraction_fix import ContractionFixer\n\n# Create a custom fixer instance\nfixer = ContractionFixer(use_informal=True, use_slang=False)\n\n# Fix single text\ntext = \"I'd like to see y'all tomorrow\"\nfixed_text = fixer.fix(text)\nprint(fixed_text)  # \"I would like to see you all tomorrow\"\n\n# Contract single text (NEW!)\nexpanded_text = \"I would like to see you all tomorrow\"\ncontracted_text = fixer.contract(expanded_text)\nprint(contracted_text)  # \"I would like to see y'all tomorrow\"\n\n# Fix multiple texts efficiently\ntexts = [\n    \"I can't believe it's working\",\n    \"They're going home\",\n    \"We'll see what happens\"\n]\nfixed_texts = fixer.fix_batch(texts)\nprint(fixed_texts)  # [\"I cannot believe it is working\", \"They are going home\", \"We will see what happens\"]\n\n# Contract multiple texts efficiently (NEW!)\nexpanded_texts = [\n    \"I cannot believe it is working\",\n    \"They are going home\",\n    \"We will see what happens\"\n]\ncontracted_texts = fixer.contract_batch(expanded_texts)\nprint(contracted_texts)  # [\"I can't believe it's working\", \"They're goin' home\", \"We'll see what happens\"]\n\n# Preview contractions\nmatches = fixer.preview(text, context_size=5)\nfor match in matches:\n    print(f\"Found '{match.text}' at position {match.start}\")\n    print(f\"Context: '{match.context}'\")\n    print(f\"Will be replaced with: '{match.replacement}'\")\n\n# Add custom contraction\nfixer.add_contraction(\"gonna\", \"going to\")\n\n# Remove contraction\nfixer.remove_contraction(\"won't\")\n```\n\n## Dictionary Types\n\nThe package uses three types of dictionaries:\n\n1. **Standard Contractions**: Common English contractions like \"can't\", \"won't\", etc.\n2. **Informal Contractions**: Less formal contractions and patterns like \"goin'\", \"doin'\", etc.\n3. **Internet Slang**: Modern internet slang and abbreviations like \"lol\", \"btw\", etc.\n\n## Performance\n\nThe package is optimized for speed through:\n- Precompiled regex patterns with cached compilation\n- LRU caching of results for repeated inputs\n- Efficient dictionary lookups with optimized key ordering\n- **Batch processing for multiple texts**\n- Minimal memory usage with frozenset constants\n- Thread-safe operations\n\n### Batch Processing Performance\n\nWhen processing multiple texts, use `fix_batch()` or `contract_batch()` for better performance:\n\n```python\nfrom contraction_fix import fix_batch, contract_batch\n\n# More efficient for multiple texts\ntexts = [\"I can't go\", \"They're here\", \"We'll see\"]\nresults = fix_batch(texts)  # Uses shared cache and optimized processing\n\n# For reverse processing\nexpanded_texts = [\"I cannot go\", \"They are here\", \"We will see\"]\nresults = contract_batch(expanded_texts)  # Uses shared cache and optimized processing\n\n# Less efficient for multiple texts\nresults = [fix(text) for text in texts]  # Creates new instances\n```\n\n## API Reference\n\n### Functions\n\n- `fix(text: str, use_informal: bool = True, use_slang: bool = True) -> str`\n- `fix_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]`\n- `contract(text: str, use_informal: bool = True, use_slang: bool = True) -> str` **(NEW!)**\n- `contract_batch(texts: List[str], use_informal: bool = True, use_slang: bool = True) -> List[str]` **(NEW!)**\n\n### Classes\n\n- `ContractionFixer(use_informal: bool = True, use_slang: bool = True, cache_size: int = 1024)`\n  - `fix(text: str) -> str`\n  - `fix_batch(texts: List[str]) -> List[str]`\n  - `contract(text: str) -> str` **(NEW!)**\n  - `contract_batch(texts: List[str]) -> List[str]` **(NEW!)**\n  - `preview(text: str, context_size: int = 10) -> List[Match]`\n  - `add_contraction(contraction: str, expansion: str) -> None`\n  - `remove_contraction(contraction: str) -> None`\n\n## What's New in v0.2.1\n\n- **Reverse Functionality**: New `contract()` and `contract_batch()` methods to convert expanded forms back to contractions\n- **Enhanced API**: Package-level convenience functions for reverse functionality\n- **Comprehensive Testing**: Extensive test coverage for all new functionality\n- **Improved Performance**: Optimizations for both expansion and contraction operations\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details. \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A fast and efficient library for fixing contractions in text with reverse functionality and batch processing support",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/xga0/contraction_fix"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de872ed65e5407b0a07aa1236ec34978e88c75ebfa365060c8e8ab8fbabdb516",
                "md5": "fee3b70d34ec97ae980efc5b6d488f83",
                "sha256": "ac8999117dc702fd9324c471c3fbd965b92605a3ef57275da573a0c9bde4d180"
            },
            "downloads": -1,
            "filename": "contraction_fix-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fee3b70d34ec97ae980efc5b6d488f83",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11683,
            "upload_time": "2025-07-22T03:03:11",
            "upload_time_iso_8601": "2025-07-22T03:03:11.057501Z",
            "url": "https://files.pythonhosted.org/packages/de/87/2ed65e5407b0a07aa1236ec34978e88c75ebfa365060c8e8ab8fbabdb516/contraction_fix-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ed846b9ad752706510156f0cf7131a24eda5f74892a40b858b745c62e8c6e91b",
                "md5": "e086b965b10c5f96dc6a26b1e107e727",
                "sha256": "72a46894e1de8dcde233bb858b09b3aeebe0a69c74c5417b291d47e44f200ae5"
            },
            "downloads": -1,
            "filename": "contraction_fix-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e086b965b10c5f96dc6a26b1e107e727",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 16679,
            "upload_time": "2025-07-22T03:03:12",
            "upload_time_iso_8601": "2025-07-22T03:03:12.205576Z",
            "url": "https://files.pythonhosted.org/packages/ed/84/6b9ad752706510156f0cf7131a24eda5f74892a40b858b745c62e8c6e91b/contraction_fix-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 03:03:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xga0",
    "github_project": "contraction_fix",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "contraction-fix"
}
        
Elapsed time: 1.28291s