Name | tidyname JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | Intelligent company name cleaning for Python |
upload_time | 2025-08-22 00:52:05 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.13 |
license | None |
keywords |
company
name
cleaning
normalization
nlp
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# TidyName
Intelligent company name cleaning for Python.
TidyName is a Python package that intelligently removes legal entity terms and
organization type indicators from company names while preserving cases where
these terms are part of the actual business name.
## Features
- **Smart Detection**: Identifies and removes corporate suffixes (LLC, Inc., Ltd., etc.)
- **Intelligent Preservation**: Preserves terms when they're part of brand names (e.g., "The Limited")
- **Confidence Scoring**: Provides confidence levels for each cleaning decision
- **International Support**: Handles international corporate suffixes (GmbH, S.A., etc.)
- **Batch Processing**: Clean multiple company names efficiently
- **Configurable**: Customize behavior through configuration options
- **Pure Python**: No external dependencies required
- **Type Safety**: Full type annotations for better IDE support
## Requirements
- Python 3.13+
## Installation
```bash
pip install tidyname
# For development
git clone https://github.com/your-repo/tidyname.git
cd tidyname
uv install
```
## Quick Start
```python
from tidyname import Cleaner
# Initialize the cleaner
cleaner = Cleaner()
# Clean a single company name
result = cleaner.clean("Apple Inc.")
print(result.original) # "Apple Inc."
print(result.cleaned) # "Apple"
print(result.confidence) # 0.95
print(result.confidence_level) # "high"
print(result.changes_made) # True
print(result.reason) # "Removed: Inc."
```
## Advanced Usage
### Batch Processing
```python
from tidyname import Cleaner
cleaner = Cleaner()
companies = [
"Apple Inc.",
"Microsoft Corporation",
"Google LLC",
"The Limited", # Will be preserved
"Amazon" # No changes needed
]
results = cleaner.clean_batch(companies)
for result in results:
print(f"{result.original} → {result.cleaned}")
```
### Configuration Options
```python
from tidyname import Cleaner, CleanerConfig
# Custom configuration
config = CleanerConfig(
remove_corporate_suffixes=True, # Enable/disable suffix removal
preserve_known_brands=True, # Preserve known brand names
min_confidence_threshold=0.7 # Minimum confidence for changes
)
cleaner = Cleaner(config=config)
# Or configure after initialization
cleaner.configure(
preserve_known_brands=False,
min_confidence_threshold=0.8
)
```
## Supported Terms
### Corporate Suffixes
- **Corporation**: Company, Incorporated, Corporation, Corp., Corp, Inc., Inc
- **Limited Liability**: LLC, L.L.C., PLC, P.L.C.
- **Limited**: Limited, Ltd., Ltd, Co., Co
- **Partnership**: & Co., & Co, LLP, L.L.P.
- **Professional**: Professional Corporation, P.C., PC
### International Suffixes
- **German**: GmbH, AG
- **French**: S.A., S.A
- **Dutch**: N.V., B.V.
- **Italian**: S.r.l., S.p.A.
## Examples
### Basic Cleaning
```python
from tidyname import Cleaner
cleaner = Cleaner()
# Standard corporate suffixes
print(cleaner.clean("Apple Inc.").cleaned) # "Apple"
print(cleaner.clean("Microsoft Corporation").cleaned) # "Microsoft"
print(cleaner.clean("Google LLC").cleaned) # "Google"
# International suffixes
print(cleaner.clean("Siemens AG").cleaned) # "Siemens"
print(cleaner.clean("L'Oréal S.A.").cleaned) # "L'Oréal"
# Multiple suffixes
print(cleaner.clean("Tech Solutions Inc. LLC").cleaned) # "Tech Solutions"
```
### Brand Preservation
```python
from tidyname import Cleaner
cleaner = Cleaner()
# These will be preserved as they're known brands
result = cleaner.clean("The Limited")
print(result.cleaned) # "The Limited"
print(result.changes_made) # False
result = cleaner.clean("Limited Brands")
print(result.cleaned) # "Limited Brands"
print(result.changes_made) # False
```
### Confidence and Reasoning
```python
from tidyname import Cleaner
cleaner = Cleaner()
result = cleaner.clean("Apple Inc.")
print(f"Confidence: {result.confidence}") # 0.95
print(f"Level: {result.confidence_level}") # "high"
print(f"Reasoning: {result.reason}") # "Removed: Inc."
# Low confidence example
result = cleaner.clean("Limited Edition")
print(f"Confidence: {result.confidence}") # Lower score
print(f"Reasoning: {result.reason}") # Preservation reasoning
```
## API Reference
### Cleaner Class
#### `__init__(config: CleanerConfig | None = None)`
Initialize the cleaner with optional configuration.
#### `clean(company_name: str) -> CleaningResult`
Clean a single company name.
**Parameters:**
- `company_name`: The company name to clean
**Returns:**
- `CleaningResult` object with cleaning results and metadata
#### `clean_batch(company_names: list[str]) -> list[CleaningResult]`
Clean multiple company names.
**Parameters:**
- `company_names`: List of company names to clean
**Returns:**
- List of `CleaningResult` objects
#### `configure(**kwargs) -> None`
Update configuration settings.
### CleaningResult
Result object containing:
- `original`: Original company name
- `cleaned`: Cleaned company name
- `confidence`: Confidence score (0.0 to 1.0)
- `confidence_level`: "high", "medium", or "low"
- `changes_made`: Boolean indicating if changes were made
- `reason`: Human-readable explanation of the decision
### CleanerConfig
Configuration object with:
- `remove_corporate_suffixes`: Enable suffix removal (default: True)
- `preserve_known_brands`: Preserve known brand names (default: True)
- `min_confidence_threshold`: Minimum confidence for changes (default: 0.5)
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite: `uv run pytest`
6. Submit a pull request
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "tidyname",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.13",
"maintainer_email": null,
"keywords": "company, name, cleaning, normalization, nlp",
"author": null,
"author_email": "Michelle Pellon <mgracepellon@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/51/70/250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8/tidyname-0.1.0.tar.gz",
"platform": null,
"description": "# TidyName\n\nIntelligent company name cleaning for Python.\n\nTidyName is a Python package that intelligently removes legal entity terms and \norganization type indicators from company names while preserving cases where \nthese terms are part of the actual business name.\n\n## Features\n\n- **Smart Detection**: Identifies and removes corporate suffixes (LLC, Inc., Ltd., etc.)\n- **Intelligent Preservation**: Preserves terms when they're part of brand names (e.g., \"The Limited\")\n- **Confidence Scoring**: Provides confidence levels for each cleaning decision\n- **International Support**: Handles international corporate suffixes (GmbH, S.A., etc.)\n- **Batch Processing**: Clean multiple company names efficiently\n- **Configurable**: Customize behavior through configuration options\n- **Pure Python**: No external dependencies required\n- **Type Safety**: Full type annotations for better IDE support\n\n## Requirements\n\n- Python 3.13+\n\n## Installation\n\n```bash\npip install tidyname\n\n# For development\ngit clone https://github.com/your-repo/tidyname.git\ncd tidyname\nuv install\n```\n\n## Quick Start\n\n```python\nfrom tidyname import Cleaner\n\n# Initialize the cleaner\ncleaner = Cleaner()\n\n# Clean a single company name\nresult = cleaner.clean(\"Apple Inc.\")\n\nprint(result.original) # \"Apple Inc.\"\nprint(result.cleaned) # \"Apple\"\nprint(result.confidence) # 0.95\nprint(result.confidence_level) # \"high\"\nprint(result.changes_made) # True\nprint(result.reason) # \"Removed: Inc.\"\n```\n\n## Advanced Usage\n\n### Batch Processing\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\ncompanies = [\n \"Apple Inc.\",\n \"Microsoft Corporation\", \n \"Google LLC\",\n \"The Limited\", # Will be preserved\n \"Amazon\" # No changes needed\n]\n\nresults = cleaner.clean_batch(companies)\n\nfor result in results:\n print(f\"{result.original} \u2192 {result.cleaned}\")\n```\n\n### Configuration Options\n\n```python\nfrom tidyname import Cleaner, CleanerConfig\n\n# Custom configuration\nconfig = CleanerConfig(\n remove_corporate_suffixes=True, # Enable/disable suffix removal\n preserve_known_brands=True, # Preserve known brand names\n min_confidence_threshold=0.7 # Minimum confidence for changes\n)\n\ncleaner = Cleaner(config=config)\n\n# Or configure after initialization\ncleaner.configure(\n preserve_known_brands=False,\n min_confidence_threshold=0.8\n)\n```\n\n## Supported Terms\n\n### Corporate Suffixes\n- **Corporation**: Company, Incorporated, Corporation, Corp., Corp, Inc., Inc\n- **Limited Liability**: LLC, L.L.C., PLC, P.L.C.\n- **Limited**: Limited, Ltd., Ltd, Co., Co\n- **Partnership**: & Co., & Co, LLP, L.L.P.\n- **Professional**: Professional Corporation, P.C., PC\n\n### International Suffixes\n- **German**: GmbH, AG\n- **French**: S.A., S.A\n- **Dutch**: N.V., B.V.\n- **Italian**: S.r.l., S.p.A.\n\n## Examples\n\n### Basic Cleaning\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\n# Standard corporate suffixes\nprint(cleaner.clean(\"Apple Inc.\").cleaned) # \"Apple\"\nprint(cleaner.clean(\"Microsoft Corporation\").cleaned) # \"Microsoft\"\nprint(cleaner.clean(\"Google LLC\").cleaned) # \"Google\"\n\n# International suffixes\nprint(cleaner.clean(\"Siemens AG\").cleaned) # \"Siemens\"\nprint(cleaner.clean(\"L'Or\u00e9al S.A.\").cleaned) # \"L'Or\u00e9al\"\n\n# Multiple suffixes\nprint(cleaner.clean(\"Tech Solutions Inc. LLC\").cleaned) # \"Tech Solutions\"\n```\n\n### Brand Preservation\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\n# These will be preserved as they're known brands\nresult = cleaner.clean(\"The Limited\")\nprint(result.cleaned) # \"The Limited\"\nprint(result.changes_made) # False\n\nresult = cleaner.clean(\"Limited Brands\")\nprint(result.cleaned) # \"Limited Brands\"\nprint(result.changes_made) # False\n```\n\n### Confidence and Reasoning\n\n```python\nfrom tidyname import Cleaner\n\ncleaner = Cleaner()\n\nresult = cleaner.clean(\"Apple Inc.\")\n\nprint(f\"Confidence: {result.confidence}\") # 0.95\nprint(f\"Level: {result.confidence_level}\") # \"high\"\nprint(f\"Reasoning: {result.reason}\") # \"Removed: Inc.\"\n\n# Low confidence example\nresult = cleaner.clean(\"Limited Edition\")\nprint(f\"Confidence: {result.confidence}\") # Lower score\nprint(f\"Reasoning: {result.reason}\") # Preservation reasoning\n```\n\n## API Reference\n\n### Cleaner Class\n\n#### `__init__(config: CleanerConfig | None = None)`\nInitialize the cleaner with optional configuration.\n\n#### `clean(company_name: str) -> CleaningResult`\nClean a single company name.\n\n**Parameters:**\n- `company_name`: The company name to clean\n\n**Returns:**\n- `CleaningResult` object with cleaning results and metadata\n\n#### `clean_batch(company_names: list[str]) -> list[CleaningResult]`\nClean multiple company names.\n\n**Parameters:**\n- `company_names`: List of company names to clean\n\n**Returns:**\n- List of `CleaningResult` objects\n\n#### `configure(**kwargs) -> None`\nUpdate configuration settings.\n\n### CleaningResult\n\nResult object containing:\n- `original`: Original company name\n- `cleaned`: Cleaned company name\n- `confidence`: Confidence score (0.0 to 1.0)\n- `confidence_level`: \"high\", \"medium\", or \"low\"\n- `changes_made`: Boolean indicating if changes were made\n- `reason`: Human-readable explanation of the decision\n\n### CleanerConfig\n\nConfiguration object with:\n- `remove_corporate_suffixes`: Enable suffix removal (default: True)\n- `preserve_known_brands`: Preserve known brand names (default: True)\n- `min_confidence_threshold`: Minimum confidence for changes (default: 0.5)\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Run the test suite: `uv run pytest`\n6. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Intelligent company name cleaning for Python",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/michellepellon/tidyname",
"Issues": "https://github.com/michellepellon/tidyname/issues",
"Repository": "https://github.com/michellepellon/tidyname"
},
"split_keywords": [
"company",
" name",
" cleaning",
" normalization",
" nlp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "19302855bb9fd53c89006bfcd4f77651aebb1377bb1bfc87e1f8871df60282e3",
"md5": "9f3701acb57a8aa3cc153c25092161b5",
"sha256": "5e9e2c7a1a6334ff6d358bd1dd5cdcc6df99fb9a457b3a2fae6e1b717cc9acec"
},
"downloads": -1,
"filename": "tidyname-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9f3701acb57a8aa3cc153c25092161b5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.13",
"size": 10936,
"upload_time": "2025-08-22T00:52:04",
"upload_time_iso_8601": "2025-08-22T00:52:04.974452Z",
"url": "https://files.pythonhosted.org/packages/19/30/2855bb9fd53c89006bfcd4f77651aebb1377bb1bfc87e1f8871df60282e3/tidyname-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5170250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8",
"md5": "70e5a31f6986fe3ed71ec046dd317402",
"sha256": "4276f20ff828d49808d3d3afbe25831e1a455c2cb2773b8e0dfc0f3c99f4eee9"
},
"downloads": -1,
"filename": "tidyname-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "70e5a31f6986fe3ed71ec046dd317402",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.13",
"size": 16569,
"upload_time": "2025-08-22T00:52:05",
"upload_time_iso_8601": "2025-08-22T00:52:05.940161Z",
"url": "https://files.pythonhosted.org/packages/51/70/250fd335b56fb6bda132c37c04fbaf44f739799bc86598467a52e8c2adc8/tidyname-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-22 00:52:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "michellepellon",
"github_project": "tidyname",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tidyname"
}