simple-anonymizer


Namesimple-anonymizer JSON
Version 0.1.18 PyPI version JSON
download
home_pageNone
SummaryPrivacy-first text anonymization tool with enterprise-grade accuracy for removing PII from documents
upload_time2025-07-23 07:51:26
maintainerAndrea Tirelli
docs_urlNone
authorAndrea Tirelli
requires_python<3.14,>=3.9
licenseApache-2.0
keywords privacy anonymization pii nlp spacy presidio data-protection text-processing privacy-tools gdpr enterprise
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # đŸ•ĩī¸ Anon - Privacy-First Text Anonymizer

[![CI](https://github.com/ATirelli/anonymizer/actions/workflows/ci.yml/badge.svg)](https://github.com/ATirelli/anonymizer/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/simple-anonymizer.svg)](https://pypi.org/project/simple-anonymizer/)
[![Python Version](https://img.shields.io/pypi/pyversions/simple-anonymizer.svg)](https://pypi.org/project/simple-anonymizer/)


A powerful, **offline-first** text anonymization tool that removes personal identifiable information (PII) from text while keeping all data on your machine. Built with enterprise-grade accuracy using spaCy NER models and Microsoft Presidio.

## ✨ Features

- 🔒 **100% Offline** - All processing happens on your machine
- đŸŽ¯ **High Accuracy** - Advanced NER using spaCy large models + Presidio
- 🔐 **Secure Always-Redact** - Custom sensitive terms stored securely in `~/.anonymizer`
- đŸ–Ĩī¸ **Multiple Interfaces** - Modern GUI, Web API, and CLI
- 🚀 **Background Processing** - CLIs run detached with proper logging
- đŸ“Ļ **Easy Installation** - One-command install with automatic model setup
- đŸĸ **Cross-Platform** - Windows, macOS, and Linux support

## 🚀 Quick Start

### Installation

```bash
pip install simple-anonymizer
```

The installation will automatically download the required spaCy model (`en_core_web_lg`) for optimal accuracy.

### Model Setup

After installation, you may need to set up the required spaCy models for full functionality:

```bash
# Automatic setup (recommended)
anon-setup-models
```

This will install:
- `en_core_web_lg` - Primary model for high-accuracy PII detection
- `en_core_web_sm` - Compatibility model for text-anonymizer integration

**Manual Installation** (if automatic setup fails):
```bash
# Primary model (large, best accuracy)
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl

# Compatibility model (small, for text-anonymizer)
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl
```

**Corporate Networks** (if SSL certificate issues occur):
```bash
pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl
```

â„šī¸ **Note**: The anonymizer will work with pattern-based detection and always-redact functionality even without the spaCy models, but accuracy will be reduced.

### GUI Application

Launch the modern GUI interface:

```bash
anon-gui
```

✅ **The GUI runs in background** - you can close the terminal after launch

📝 **Logs available** at `~/.anonymizer/gui_YYYYMMDD_HHMMSS.log`

### Web Interface

Start the web server:

```bash
anon-web start
```

✅ **Server runs in background** - accessible at http://127.0.0.1:8080

📝 **Comprehensive logging** and process management

#### Web Server Management

```bash
# Start server (custom host/port)
anon-web start --host 0.0.0.0 --port 5000

# Check server status
anon-web status

# View recent logs
anon-web logs

# Stop server
anon-web stop

# Clean old log files (preserves always-redact settings)
anon-web clean
```

### Always-Redact Management

Securely manage custom sensitive terms that should always be anonymized:

```bash
# Add terms to always-redact list
anon-web add-redact "CompanyName"
anon-web add-redact "ProjectCodename"

# Remove terms from always-redact list
anon-web remove-redact "ProjectCodename"

# List all always-redacted terms
anon-web list-redact
```

🔐 **Security Features:**
- Terms stored securely in `~/.anonymizer/always_redact.txt`
- Not visible in GUI or web interfaces (add/remove only)
- Persists across all anonymization operations
- Case-insensitive matching with duplicate prevention

### Python API

```python
from anonymizer_core import redact

# Basic anonymization
result = redact("John Doe works at Microsoft in Seattle.")
print(result.text)
# Output: "<REDACTED> works at <REDACTED> in <REDACTED>."

# Always-redact terms are automatically applied
# (managed via CLI commands shown above)
result = redact("Contact john@acme.com about AcmeProject details.")
print(result.text)
# Output: "Contact <REDACTED> about <REDACTED> details."
# (if "AcmeProject" was added to always-redact list)
```

## 🔐 Data Security & Privacy

### Always-Redact Terms
- **Secure Storage**: Custom sensitive terms are stored in `~/.anonymizer/always_redact.txt`
- **No Shipping**: The file is created locally on first use, never shipped with the package
- **Privacy-First**: Terms are not exposed through GUI or web interfaces
- **CLI-Only Access**: Terms can only be viewed via command line for security
- **Persistent**: Settings survive application updates and log cleanups

### File Locations
```bash
# User data directory
~/.anonymizer/
├── always_redact.txt         # Your custom sensitive terms
├── gui_YYYYMMDD_HHMMSS.log  # GUI application logs
└── web_server_*.log         # Web server logs
```

### Data Flow
1. **Input Text** → **Standard PII Detection** (emails, phones, etc.)
2. **Input Text** → **Always-Redact Terms** (your custom words) 
3. **Combined Results** → **Final Anonymized Output**

## 🔧 Advanced Usage

### GUI Features
- **Modern Interface**: Clean, intuitive design with real-time processing
- **Secure Term Management**: Add/remove always-redact terms without exposure
- **File Processing**: Load and save text files directly
- **Background Processing**: Non-blocking anonymization with progress indicators

### Web API Features
- **RESTful Endpoints**: Standard HTTP API for integration
- **File Upload**: Process text files via web interface  
- **JSON Response**: Structured output with metadata
- **Health Checks**: Monitor service status programmatically

### CLI Management
- **Process Control**: Start/stop/status for web server
- **Log Management**: View and clean application logs
- **Term Management**: Secure always-redact term administration
- **Background Operation**: All services run detached from terminal

## đŸ› ī¸ Technical Details

### Anonymization Engine
- **Multi-Tier Processing**: Pattern-based → Always-redact → NER fallback
- **Position Tracking**: Prevents overlapping redactions for accuracy
- **Case Insensitive**: Always-redact terms match regardless of case
- **Word Boundaries**: Only complete words are redacted (not partial matches)

### Supported Entity Types
- **Emails**: john@example.com
- **URLs**: https://example.com  
- **IP Addresses**: 192.168.1.1
- **Phone Numbers**: +1-555-123-4567
- **Custom Terms**: Your always-redact list
- **Names**: Via NER when available
- **Organizations**: Via NER when available

## 📋 Examples & Use Cases

### Basic Anonymization
```python
from anonymizer_core import redact

text = "Please contact John Smith at john.smith@acme.com or call +1-555-0123."
result = redact(text)
print(result.text)
# Output: "Please contact <REDACTED> at <REDACTED> or call <REDACTED>."
```

### Company-Specific Anonymization
```bash
# Set up company-specific terms
anon-web add-redact "AcmeCorp"
anon-web add-redact "ProjectTitan"
anon-web add-redact "confidential"

# Now these terms are always redacted
python -c "
from anonymizer_core import redact
text = 'AcmeCorp confidential: ProjectTitan budget is 500K'
print(redact(text).text)
"
# Output: "<REDACTED> <REDACTED>: <REDACTED> budget is 500K"
```

### Enterprise Integration
```python
# Configure once via CLI
# anon-web add-redact "YourCompanyName"
# anon-web add-redact "YourProduct"

# Use in your application
from anonymizer_core import redact

def process_support_ticket(ticket_text):
    """Anonymize support tickets before logging."""
    result = redact(ticket_text)
    return result.text

# All company-specific terms are automatically redacted
anonymized = process_support_ticket(
    "Customer john@email.com reported YourProduct crashed on YourCompanyName servers."
)
print(anonymized)
# Output: "Customer <REDACTED> reported <REDACTED> crashed on <REDACTED> servers."
```

### Batch Processing
```bash
# Set up your terms once
anon-web add-redact "SensitiveTerm1"
anon-web add-redact "SensitiveTerm2"

# Process multiple files - terms persist across all operations
for file in *.txt; do
    python -c "
from anonymizer_core import redact
with open('$file', 'r') as f:
    content = f.read()
with open('anonymized_$file', 'w') as f:
    f.write(redact(content).text)
    "
done
```

### Security Audit
```bash
# List all configured terms (CLI only for security)
anon-web list-redact

# Remove terms that are no longer sensitive
anon-web remove-redact "OldProjectName"

# Clean logs while preserving term configuration
anon-web clean
```

## 🚨 Security Best Practices

### Always-Redact Configuration
- **Review Regularly**: Audit your always-redact terms periodically
- **Principle of Least Privilege**: Only add terms that truly need redaction
- **Team Coordination**: Ensure team members know which terms are configured
- **Backup**: Consider backing up `~/.anonymizer/always_redact.txt` securely

### Production Deployment
- **Isolated Environment**: Deploy in secure, isolated environments
- **Log Management**: Regularly clean logs with `anon-web clean`
- **Access Control**: Restrict CLI access to authorized personnel only
- **Monitor Usage**: Review anonymization logs for compliance

## 📊 CLI Command Reference

### Server Management
```bash
anon-web start [--host HOST] [--port PORT]  # Start web server
anon-web stop                                # Stop web server  
anon-web status                              # Check server status
anon-web logs                                # View recent logs
anon-web clean                               # Clean old logs (preserve settings)
```

### Always-Redact Management
```bash
anon-web add-redact "TERM"                   # Add term to always-redact list
anon-web remove-redact "TERM"                # Remove term from list
anon-web list-redact                         # List all terms (CLI only)
```

### GUI Launch
```bash
anon-gui                                     # Launch GUI application
```

## 🔍 Troubleshooting

### Common Issues

**Terms not being redacted?**
- Verify term was added: `anon-web list-redact`
- Check exact spelling and case sensitivity
- Ensure word boundaries (partial matches won't work)

**GUI/Web not reflecting new terms?**
- This is by design for security
- Terms are automatically applied during anonymization
- Use CLI `list-redact` to verify configuration

**Server won't start?**
- Check if port is already in use: `anon-web status`
- Try different port: `anon-web start --port 8081`
- Check logs: `anon-web logs`

**Performance issues?**
- Clean old logs: `anon-web clean`
- For large texts, consider batch processing
- Restart services if needed: `anon-web stop && anon-web start`

**SSL/Certificate errors during installation?**
- Try installing with trusted hosts: `pip install --trusted-host github.com --trusted-host objects.githubusercontent.com simple-anonymizer`
- For spaCy models: `pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl`
- Or run the model setup utility: `anon-setup-models`
- The anonymizer will still work with always-redact and pattern matching even without advanced NER models

**Models not downloading?**
- Check your internet connection and firewall settings
- Try manual installation using the URLs provided in the Model Setup section
- Use `anon-setup-models` for automatic retry with SSL workarounds
- Verify models are installed: `python -c "import spacy; print(spacy.util.get_installed_models())"`

---

**Need help?** Check the logs in `~/.anonymizer/` for detailed error information.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "simple-anonymizer",
    "maintainer": "Andrea Tirelli",
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": "atirellimate@gmail.com",
    "keywords": "privacy, anonymization, pii, nlp, spacy, presidio, data-protection, text-processing, privacy-tools, gdpr, enterprise",
    "author": "Andrea Tirelli",
    "author_email": "atirellimate@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/11/4a/47fb89ad223181bf6147c3f5a64f290a279e9e8917bd65c768f69414f63c/simple_anonymizer-0.1.18.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd75\ufe0f Anon - Privacy-First Text Anonymizer\n\n[![CI](https://github.com/ATirelli/anonymizer/actions/workflows/ci.yml/badge.svg)](https://github.com/ATirelli/anonymizer/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/simple-anonymizer.svg)](https://pypi.org/project/simple-anonymizer/)\n[![Python Version](https://img.shields.io/pypi/pyversions/simple-anonymizer.svg)](https://pypi.org/project/simple-anonymizer/)\n\n\nA powerful, **offline-first** text anonymization tool that removes personal identifiable information (PII) from text while keeping all data on your machine. Built with enterprise-grade accuracy using spaCy NER models and Microsoft Presidio.\n\n## \u2728 Features\n\n- \ud83d\udd12 **100% Offline** - All processing happens on your machine\n- \ud83c\udfaf **High Accuracy** - Advanced NER using spaCy large models + Presidio\n- \ud83d\udd10 **Secure Always-Redact** - Custom sensitive terms stored securely in `~/.anonymizer`\n- \ud83d\udda5\ufe0f **Multiple Interfaces** - Modern GUI, Web API, and CLI\n- \ud83d\ude80 **Background Processing** - CLIs run detached with proper logging\n- \ud83d\udce6 **Easy Installation** - One-command install with automatic model setup\n- \ud83c\udfe2 **Cross-Platform** - Windows, macOS, and Linux support\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install simple-anonymizer\n```\n\nThe installation will automatically download the required spaCy model (`en_core_web_lg`) for optimal accuracy.\n\n### Model Setup\n\nAfter installation, you may need to set up the required spaCy models for full functionality:\n\n```bash\n# Automatic setup (recommended)\nanon-setup-models\n```\n\nThis will install:\n- `en_core_web_lg` - Primary model for high-accuracy PII detection\n- `en_core_web_sm` - Compatibility model for text-anonymizer integration\n\n**Manual Installation** (if automatic setup fails):\n```bash\n# Primary model (large, best accuracy)\npip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl\n\n# Compatibility model (small, for text-anonymizer)\npip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl\n```\n\n**Corporate Networks** (if SSL certificate issues occur):\n```bash\npip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl\n```\n\n\u2139\ufe0f **Note**: The anonymizer will work with pattern-based detection and always-redact functionality even without the spaCy models, but accuracy will be reduced.\n\n### GUI Application\n\nLaunch the modern GUI interface:\n\n```bash\nanon-gui\n```\n\n\u2705 **The GUI runs in background** - you can close the terminal after launch\n\n\ud83d\udcdd **Logs available** at `~/.anonymizer/gui_YYYYMMDD_HHMMSS.log`\n\n### Web Interface\n\nStart the web server:\n\n```bash\nanon-web start\n```\n\n\u2705 **Server runs in background** - accessible at http://127.0.0.1:8080\n\n\ud83d\udcdd **Comprehensive logging** and process management\n\n#### Web Server Management\n\n```bash\n# Start server (custom host/port)\nanon-web start --host 0.0.0.0 --port 5000\n\n# Check server status\nanon-web status\n\n# View recent logs\nanon-web logs\n\n# Stop server\nanon-web stop\n\n# Clean old log files (preserves always-redact settings)\nanon-web clean\n```\n\n### Always-Redact Management\n\nSecurely manage custom sensitive terms that should always be anonymized:\n\n```bash\n# Add terms to always-redact list\nanon-web add-redact \"CompanyName\"\nanon-web add-redact \"ProjectCodename\"\n\n# Remove terms from always-redact list\nanon-web remove-redact \"ProjectCodename\"\n\n# List all always-redacted terms\nanon-web list-redact\n```\n\n\ud83d\udd10 **Security Features:**\n- Terms stored securely in `~/.anonymizer/always_redact.txt`\n- Not visible in GUI or web interfaces (add/remove only)\n- Persists across all anonymization operations\n- Case-insensitive matching with duplicate prevention\n\n### Python API\n\n```python\nfrom anonymizer_core import redact\n\n# Basic anonymization\nresult = redact(\"John Doe works at Microsoft in Seattle.\")\nprint(result.text)\n# Output: \"<REDACTED> works at <REDACTED> in <REDACTED>.\"\n\n# Always-redact terms are automatically applied\n# (managed via CLI commands shown above)\nresult = redact(\"Contact john@acme.com about AcmeProject details.\")\nprint(result.text)\n# Output: \"Contact <REDACTED> about <REDACTED> details.\"\n# (if \"AcmeProject\" was added to always-redact list)\n```\n\n## \ud83d\udd10 Data Security & Privacy\n\n### Always-Redact Terms\n- **Secure Storage**: Custom sensitive terms are stored in `~/.anonymizer/always_redact.txt`\n- **No Shipping**: The file is created locally on first use, never shipped with the package\n- **Privacy-First**: Terms are not exposed through GUI or web interfaces\n- **CLI-Only Access**: Terms can only be viewed via command line for security\n- **Persistent**: Settings survive application updates and log cleanups\n\n### File Locations\n```bash\n# User data directory\n~/.anonymizer/\n\u251c\u2500\u2500 always_redact.txt         # Your custom sensitive terms\n\u251c\u2500\u2500 gui_YYYYMMDD_HHMMSS.log  # GUI application logs\n\u2514\u2500\u2500 web_server_*.log         # Web server logs\n```\n\n### Data Flow\n1. **Input Text** \u2192 **Standard PII Detection** (emails, phones, etc.)\n2. **Input Text** \u2192 **Always-Redact Terms** (your custom words) \n3. **Combined Results** \u2192 **Final Anonymized Output**\n\n## \ud83d\udd27 Advanced Usage\n\n### GUI Features\n- **Modern Interface**: Clean, intuitive design with real-time processing\n- **Secure Term Management**: Add/remove always-redact terms without exposure\n- **File Processing**: Load and save text files directly\n- **Background Processing**: Non-blocking anonymization with progress indicators\n\n### Web API Features\n- **RESTful Endpoints**: Standard HTTP API for integration\n- **File Upload**: Process text files via web interface  \n- **JSON Response**: Structured output with metadata\n- **Health Checks**: Monitor service status programmatically\n\n### CLI Management\n- **Process Control**: Start/stop/status for web server\n- **Log Management**: View and clean application logs\n- **Term Management**: Secure always-redact term administration\n- **Background Operation**: All services run detached from terminal\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### Anonymization Engine\n- **Multi-Tier Processing**: Pattern-based \u2192 Always-redact \u2192 NER fallback\n- **Position Tracking**: Prevents overlapping redactions for accuracy\n- **Case Insensitive**: Always-redact terms match regardless of case\n- **Word Boundaries**: Only complete words are redacted (not partial matches)\n\n### Supported Entity Types\n- **Emails**: john@example.com\n- **URLs**: https://example.com  \n- **IP Addresses**: 192.168.1.1\n- **Phone Numbers**: +1-555-123-4567\n- **Custom Terms**: Your always-redact list\n- **Names**: Via NER when available\n- **Organizations**: Via NER when available\n\n## \ud83d\udccb Examples & Use Cases\n\n### Basic Anonymization\n```python\nfrom anonymizer_core import redact\n\ntext = \"Please contact John Smith at john.smith@acme.com or call +1-555-0123.\"\nresult = redact(text)\nprint(result.text)\n# Output: \"Please contact <REDACTED> at <REDACTED> or call <REDACTED>.\"\n```\n\n### Company-Specific Anonymization\n```bash\n# Set up company-specific terms\nanon-web add-redact \"AcmeCorp\"\nanon-web add-redact \"ProjectTitan\"\nanon-web add-redact \"confidential\"\n\n# Now these terms are always redacted\npython -c \"\nfrom anonymizer_core import redact\ntext = 'AcmeCorp confidential: ProjectTitan budget is 500K'\nprint(redact(text).text)\n\"\n# Output: \"<REDACTED> <REDACTED>: <REDACTED> budget is 500K\"\n```\n\n### Enterprise Integration\n```python\n# Configure once via CLI\n# anon-web add-redact \"YourCompanyName\"\n# anon-web add-redact \"YourProduct\"\n\n# Use in your application\nfrom anonymizer_core import redact\n\ndef process_support_ticket(ticket_text):\n    \"\"\"Anonymize support tickets before logging.\"\"\"\n    result = redact(ticket_text)\n    return result.text\n\n# All company-specific terms are automatically redacted\nanonymized = process_support_ticket(\n    \"Customer john@email.com reported YourProduct crashed on YourCompanyName servers.\"\n)\nprint(anonymized)\n# Output: \"Customer <REDACTED> reported <REDACTED> crashed on <REDACTED> servers.\"\n```\n\n### Batch Processing\n```bash\n# Set up your terms once\nanon-web add-redact \"SensitiveTerm1\"\nanon-web add-redact \"SensitiveTerm2\"\n\n# Process multiple files - terms persist across all operations\nfor file in *.txt; do\n    python -c \"\nfrom anonymizer_core import redact\nwith open('$file', 'r') as f:\n    content = f.read()\nwith open('anonymized_$file', 'w') as f:\n    f.write(redact(content).text)\n    \"\ndone\n```\n\n### Security Audit\n```bash\n# List all configured terms (CLI only for security)\nanon-web list-redact\n\n# Remove terms that are no longer sensitive\nanon-web remove-redact \"OldProjectName\"\n\n# Clean logs while preserving term configuration\nanon-web clean\n```\n\n## \ud83d\udea8 Security Best Practices\n\n### Always-Redact Configuration\n- **Review Regularly**: Audit your always-redact terms periodically\n- **Principle of Least Privilege**: Only add terms that truly need redaction\n- **Team Coordination**: Ensure team members know which terms are configured\n- **Backup**: Consider backing up `~/.anonymizer/always_redact.txt` securely\n\n### Production Deployment\n- **Isolated Environment**: Deploy in secure, isolated environments\n- **Log Management**: Regularly clean logs with `anon-web clean`\n- **Access Control**: Restrict CLI access to authorized personnel only\n- **Monitor Usage**: Review anonymization logs for compliance\n\n## \ud83d\udcca CLI Command Reference\n\n### Server Management\n```bash\nanon-web start [--host HOST] [--port PORT]  # Start web server\nanon-web stop                                # Stop web server  \nanon-web status                              # Check server status\nanon-web logs                                # View recent logs\nanon-web clean                               # Clean old logs (preserve settings)\n```\n\n### Always-Redact Management\n```bash\nanon-web add-redact \"TERM\"                   # Add term to always-redact list\nanon-web remove-redact \"TERM\"                # Remove term from list\nanon-web list-redact                         # List all terms (CLI only)\n```\n\n### GUI Launch\n```bash\nanon-gui                                     # Launch GUI application\n```\n\n## \ud83d\udd0d Troubleshooting\n\n### Common Issues\n\n**Terms not being redacted?**\n- Verify term was added: `anon-web list-redact`\n- Check exact spelling and case sensitivity\n- Ensure word boundaries (partial matches won't work)\n\n**GUI/Web not reflecting new terms?**\n- This is by design for security\n- Terms are automatically applied during anonymization\n- Use CLI `list-redact` to verify configuration\n\n**Server won't start?**\n- Check if port is already in use: `anon-web status`\n- Try different port: `anon-web start --port 8081`\n- Check logs: `anon-web logs`\n\n**Performance issues?**\n- Clean old logs: `anon-web clean`\n- For large texts, consider batch processing\n- Restart services if needed: `anon-web stop && anon-web start`\n\n**SSL/Certificate errors during installation?**\n- Try installing with trusted hosts: `pip install --trusted-host github.com --trusted-host objects.githubusercontent.com simple-anonymizer`\n- For spaCy models: `pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl`\n- Or run the model setup utility: `anon-setup-models`\n- The anonymizer will still work with always-redact and pattern matching even without advanced NER models\n\n**Models not downloading?**\n- Check your internet connection and firewall settings\n- Try manual installation using the URLs provided in the Model Setup section\n- Use `anon-setup-models` for automatic retry with SSL workarounds\n- Verify models are installed: `python -c \"import spacy; print(spacy.util.get_installed_models())\"`\n\n---\n\n**Need help?** Check the logs in `~/.anonymizer/` for detailed error information.",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Privacy-first text anonymization tool with enterprise-grade accuracy for removing PII from documents",
    "version": "0.1.18",
    "project_urls": null,
    "split_keywords": [
        "privacy",
        " anonymization",
        " pii",
        " nlp",
        " spacy",
        " presidio",
        " data-protection",
        " text-processing",
        " privacy-tools",
        " gdpr",
        " enterprise"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e7fbe06260d50171eb36310bde3ec46bc0689c3304dfd53c1b75f0f603994f6a",
                "md5": "b193d83097bc34a2a9d1adce9221a2ea",
                "sha256": "19f7b8826d63f6dea50a19bc1c4360e83ab1c168af3fecb7049a10344f58d022"
            },
            "downloads": -1,
            "filename": "simple_anonymizer-0.1.18-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b193d83097bc34a2a9d1adce9221a2ea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 57743,
            "upload_time": "2025-07-23T07:51:24",
            "upload_time_iso_8601": "2025-07-23T07:51:24.382199Z",
            "url": "https://files.pythonhosted.org/packages/e7/fb/e06260d50171eb36310bde3ec46bc0689c3304dfd53c1b75f0f603994f6a/simple_anonymizer-0.1.18-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "114a47fb89ad223181bf6147c3f5a64f290a279e9e8917bd65c768f69414f63c",
                "md5": "8c6f0b405f2786b28940f823ee693661",
                "sha256": "4e0e7353016e9674e0417d7f4813b8c60d02ed8c80edcac497ee8facad249700"
            },
            "downloads": -1,
            "filename": "simple_anonymizer-0.1.18.tar.gz",
            "has_sig": false,
            "md5_digest": "8c6f0b405f2786b28940f823ee693661",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 335881,
            "upload_time": "2025-07-23T07:51:26",
            "upload_time_iso_8601": "2025-07-23T07:51:26.022834Z",
            "url": "https://files.pythonhosted.org/packages/11/4a/47fb89ad223181bf6147c3f5a64f290a279e9e8917bd65c768f69414f63c/simple_anonymizer-0.1.18.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-23 07:51:26",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "simple-anonymizer"
}
        
Elapsed time: 0.59440s