tidyname


Nametidyname JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryIntelligent company name cleaning for Python
upload_time2025-08-22 00:52:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.13
licenseNone
keywords company name cleaning normalization nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TidyName

Intelligent company name cleaning for Python.

TidyName is a Python package that intelligently removes legal entity terms and 
organization type indicators from company names while preserving cases where 
these terms are part of the actual business name.

## Features

- **Smart Detection**: Identifies and removes corporate suffixes (LLC, Inc., Ltd., etc.)
- **Intelligent Preservation**: Preserves terms when they're part of brand names (e.g., "The Limited")
- **Confidence Scoring**: Provides confidence levels for each cleaning decision
- **International Support**: Handles international corporate suffixes (GmbH, S.A., etc.)
- **Batch Processing**: Clean multiple company names efficiently
- **Configurable**: Customize behavior through configuration options
- **Pure Python**: No external dependencies required
- **Type Safety**: Full type annotations for better IDE support

## Requirements

- Python 3.13+

## Installation

```bash
pip install tidyname

# For development
git clone https://github.com/your-repo/tidyname.git
cd tidyname
uv install
```

## Quick Start

```python
from tidyname import Cleaner

# Initialize the cleaner
cleaner = Cleaner()

# Clean a single company name
result = cleaner.clean("Apple Inc.")

print(result.original)         # "Apple Inc."
print(result.cleaned)          # "Apple"
print(result.confidence)       # 0.95
print(result.confidence_level) # "high"
print(result.changes_made)     # True
print(result.reason)           # "Removed: Inc."
```

## Advanced Usage

### Batch Processing

```python
from tidyname import Cleaner

cleaner = Cleaner()

companies = [
    "Apple Inc.",
    "Microsoft Corporation", 
    "Google LLC",
    "The Limited",  # Will be preserved
    "Amazon"        # No changes needed
]

results = cleaner.clean_batch(companies)

for result in results:
    print(f"{result.original} → {result.cleaned}")
```

### Configuration Options

```python
from tidyname import Cleaner, CleanerConfig

# Custom configuration
config = CleanerConfig(
    remove_corporate_suffixes=True,     # Enable/disable suffix removal
    preserve_known_brands=True,         # Preserve known brand names
    min_confidence_threshold=0.7        # Minimum confidence for changes
)

cleaner = Cleaner(config=config)

# Or configure after initialization
cleaner.configure(
    preserve_known_brands=False,
    min_confidence_threshold=0.8
)
```

## Supported Terms

### Corporate Suffixes
- **Corporation**: Company, Incorporated, Corporation, Corp., Corp, Inc., Inc
- **Limited Liability**: LLC, L.L.C., PLC, P.L.C.
- **Limited**: Limited, Ltd., Ltd, Co., Co
- **Partnership**: & Co., & Co, LLP, L.L.P.
- **Professional**: Professional Corporation, P.C., PC

### International Suffixes
- **German**: GmbH, AG
- **French**: S.A., S.A
- **Dutch**: N.V., B.V.
- **Italian**: S.r.l., S.p.A.

## Examples

### Basic Cleaning

```python
from tidyname import Cleaner

cleaner = Cleaner()

# Standard corporate suffixes
print(cleaner.clean("Apple Inc.").cleaned)           # "Apple"
print(cleaner.clean("Microsoft Corporation").cleaned) # "Microsoft"
print(cleaner.clean("Google LLC").cleaned)           # "Google"

# International suffixes
print(cleaner.clean("Siemens AG").cleaned)           # "Siemens"
print(cleaner.clean("L'Oréal S.A.").cleaned)        # "L'Oréal"

# Multiple suffixes
print(cleaner.clean("Tech Solutions Inc. LLC").cleaned) # "Tech Solutions"
```

### Brand Preservation

```python
from tidyname import Cleaner

cleaner = Cleaner()

# These will be preserved as they're known brands
result = cleaner.clean("The Limited")
print(result.cleaned)      # "The Limited"
print(result.changes_made) # False

result = cleaner.clean("Limited Brands")
print(result.cleaned)      # "Limited Brands"
print(result.changes_made) # False
```

### Confidence and Reasoning

```python
from tidyname import Cleaner

cleaner = Cleaner()

result = cleaner.clean("Apple Inc.")

print(f"Confidence: {result.confidence}")           # 0.95
print(f"Level: {result.confidence_level}")          # "high"
print(f"Reasoning: {result.reason}")                # "Removed: Inc."

# Low confidence example
result = cleaner.clean("Limited Edition")
print(f"Confidence: {result.confidence}")           # Lower score
print(f"Reasoning: {result.reason}")                # Preservation reasoning
```

## API Reference

### Cleaner Class

#### `__init__(config: CleanerConfig | None = None)`
Initialize the cleaner with optional configuration.

#### `clean(company_name: str) -> CleaningResult`
Clean a single company name.

**Parameters:**
- `company_name`: The company name to clean

**Returns:**
- `CleaningResult` object with cleaning results and metadata

#### `clean_batch(company_names: list[str]) -> list[CleaningResult]`
Clean multiple company names.

**Parameters:**
- `company_names`: List of company names to clean

**Returns:**
- List of `CleaningResult` objects

#### `configure(**kwargs) -> None`
Update configuration settings.

### CleaningResult

Result object containing:
- `original`: Original company name
- `cleaned`: Cleaned company name
- `confidence`: Confidence score (0.0 to 1.0)
- `confidence_level`: "high", "medium", or "low"
- `changes_made`: Boolean indicating if changes were made
- `reason`: Human-readable explanation of the decision

### CleanerConfig

Configuration object with:
- `remove_corporate_suffixes`: Enable suffix removal (default: True)
- `preserve_known_brands`: Preserve known brand names (default: True)
- `min_confidence_threshold`: Minimum confidence for changes (default: 0.5)

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite: `uv run pytest`
6. Submit a pull request

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tidyname",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "company, name, cleaning, normalization, nlp",
    "author": null,
    "author_email": "Michelle Pellon <mgracepellon@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/51/70/250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8/tidyname-0.1.0.tar.gz",
    "platform": null,
    "description": "# TidyName\n\nIntelligent company name cleaning for Python.\n\nTidyName is a Python package that intelligently removes legal entity terms and \norganization type indicators from company names while preserving cases where \nthese terms are part of the actual business name.\n\n## Features\n\n- **Smart Detection**: Identifies and removes corporate suffixes (LLC, Inc., Ltd., etc.)\n- **Intelligent Preservation**: Preserves terms when they're part of brand names (e.g., \"The Limited\")\n- **Confidence Scoring**: Provides confidence levels for each cleaning decision\n- **International Support**: Handles international corporate suffixes (GmbH, S.A., etc.)\n- **Batch Processing**: Clean multiple company names efficiently\n- **Configurable**: Customize behavior through configuration options\n- **Pure Python**: No external dependencies required\n- **Type Safety**: Full type annotations for better IDE support\n\n## Requirements\n\n- Python 3.13+\n\n## Installation\n\n```bash\npip install tidyname\n\n# For development\ngit clone https://github.com/your-repo/tidyname.git\ncd tidyname\nuv install\n```\n\n## Quick Start\n\n```python\nfrom tidyname import Cleaner\n\n# Initialize the cleaner\ncleaner = Cleaner()\n\n# Clean a single company name\nresult = cleaner.clean(\"Apple Inc.\")\n\nprint(result.original)         # \"Apple Inc.\"\nprint(result.cleaned)          # \"Apple\"\nprint(result.confidence)       # 0.95\nprint(result.confidence_level) # \"high\"\nprint(result.changes_made)     # True\nprint(result.reason)           # \"Removed: Inc.\"\n```\n\n## Advanced Usage\n\n### Batch Processing\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\ncompanies = [\n    \"Apple Inc.\",\n    \"Microsoft Corporation\", \n    \"Google LLC\",\n    \"The Limited\",  # Will be preserved\n    \"Amazon\"        # No changes needed\n]\n\nresults = cleaner.clean_batch(companies)\n\nfor result in results:\n    print(f\"{result.original} \u2192 {result.cleaned}\")\n```\n\n### Configuration Options\n\n```python\nfrom tidyname import Cleaner, CleanerConfig\n\n# Custom configuration\nconfig = CleanerConfig(\n    remove_corporate_suffixes=True,     # Enable/disable suffix removal\n    preserve_known_brands=True,         # Preserve known brand names\n    min_confidence_threshold=0.7        # Minimum confidence for changes\n)\n\ncleaner = Cleaner(config=config)\n\n# Or configure after initialization\ncleaner.configure(\n    preserve_known_brands=False,\n    min_confidence_threshold=0.8\n)\n```\n\n## Supported Terms\n\n### Corporate Suffixes\n- **Corporation**: Company, Incorporated, Corporation, Corp., Corp, Inc., Inc\n- **Limited Liability**: LLC, L.L.C., PLC, P.L.C.\n- **Limited**: Limited, Ltd., Ltd, Co., Co\n- **Partnership**: & Co., & Co, LLP, L.L.P.\n- **Professional**: Professional Corporation, P.C., PC\n\n### International Suffixes\n- **German**: GmbH, AG\n- **French**: S.A., S.A\n- **Dutch**: N.V., B.V.\n- **Italian**: S.r.l., S.p.A.\n\n## Examples\n\n### Basic Cleaning\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\n# Standard corporate suffixes\nprint(cleaner.clean(\"Apple Inc.\").cleaned)           # \"Apple\"\nprint(cleaner.clean(\"Microsoft Corporation\").cleaned) # \"Microsoft\"\nprint(cleaner.clean(\"Google LLC\").cleaned)           # \"Google\"\n\n# International suffixes\nprint(cleaner.clean(\"Siemens AG\").cleaned)           # \"Siemens\"\nprint(cleaner.clean(\"L'Or\u00e9al S.A.\").cleaned)        # \"L'Or\u00e9al\"\n\n# Multiple suffixes\nprint(cleaner.clean(\"Tech Solutions Inc. LLC\").cleaned) # \"Tech Solutions\"\n```\n\n### Brand Preservation\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\n# These will be preserved as they're known brands\nresult = cleaner.clean(\"The Limited\")\nprint(result.cleaned)      # \"The Limited\"\nprint(result.changes_made) # False\n\nresult = cleaner.clean(\"Limited Brands\")\nprint(result.cleaned)      # \"Limited Brands\"\nprint(result.changes_made) # False\n```\n\n### Confidence and Reasoning\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\nresult = cleaner.clean(\"Apple Inc.\")\n\nprint(f\"Confidence: {result.confidence}\")           # 0.95\nprint(f\"Level: {result.confidence_level}\")          # \"high\"\nprint(f\"Reasoning: {result.reason}\")                # \"Removed: Inc.\"\n\n# Low confidence example\nresult = cleaner.clean(\"Limited Edition\")\nprint(f\"Confidence: {result.confidence}\")           # Lower score\nprint(f\"Reasoning: {result.reason}\")                # Preservation reasoning\n```\n\n## API Reference\n\n### Cleaner Class\n\n#### `__init__(config: CleanerConfig | None = None)`\nInitialize the cleaner with optional configuration.\n\n#### `clean(company_name: str) -> CleaningResult`\nClean a single company name.\n\n**Parameters:**\n- `company_name`: The company name to clean\n\n**Returns:**\n- `CleaningResult` object with cleaning results and metadata\n\n#### `clean_batch(company_names: list[str]) -> list[CleaningResult]`\nClean multiple company names.\n\n**Parameters:**\n- `company_names`: List of company names to clean\n\n**Returns:**\n- List of `CleaningResult` objects\n\n#### `configure(**kwargs) -> None`\nUpdate configuration settings.\n\n### CleaningResult\n\nResult object containing:\n- `original`: Original company name\n- `cleaned`: Cleaned company name\n- `confidence`: Confidence score (0.0 to 1.0)\n- `confidence_level`: \"high\", \"medium\", or \"low\"\n- `changes_made`: Boolean indicating if changes were made\n- `reason`: Human-readable explanation of the decision\n\n### CleanerConfig\n\nConfiguration object with:\n- `remove_corporate_suffixes`: Enable suffix removal (default: True)\n- `preserve_known_brands`: Preserve known brand names (default: True)\n- `min_confidence_threshold`: Minimum confidence for changes (default: 0.5)\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Run the test suite: `uv run pytest`\n6. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Intelligent company name cleaning for Python",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/michellepellon/tidyname",
        "Issues": "https://github.com/michellepellon/tidyname/issues",
        "Repository": "https://github.com/michellepellon/tidyname"
    },
    "split_keywords": [
        "company",
        " name",
        " cleaning",
        " normalization",
        " nlp"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "19302855bb9fd53c89006bfcd4f77651aebb1377bb1bfc87e1f8871df60282e3",
                "md5": "9f3701acb57a8aa3cc153c25092161b5",
                "sha256": "5e9e2c7a1a6334ff6d358bd1dd5cdcc6df99fb9a457b3a2fae6e1b717cc9acec"
            },
            "downloads": -1,
            "filename": "tidyname-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9f3701acb57a8aa3cc153c25092161b5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 10936,
            "upload_time": "2025-08-22T00:52:04",
            "upload_time_iso_8601": "2025-08-22T00:52:04.974452Z",
            "url": "https://files.pythonhosted.org/packages/19/30/2855bb9fd53c89006bfcd4f77651aebb1377bb1bfc87e1f8871df60282e3/tidyname-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5170250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8",
                "md5": "70e5a31f6986fe3ed71ec046dd317402",
                "sha256": "4276f20ff828d49808d3d3afbe25831e1a455c2cb2773b8e0dfc0f3c99f4eee9"
            },
            "downloads": -1,
            "filename": "tidyname-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "70e5a31f6986fe3ed71ec046dd317402",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 16569,
            "upload_time": "2025-08-22T00:52:05",
            "upload_time_iso_8601": "2025-08-22T00:52:05.940161Z",
            "url": "https://files.pythonhosted.org/packages/51/70/250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8/tidyname-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-22 00:52:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "michellepellon",
    "github_project": "tidyname",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tidyname"
}
        
Elapsed time: 0.56200s