# AnotiAI PII Masker - Cloud-Powered Privacy Protection
A lightweight Python package for detecting and masking personally identifiable information (PII) in text using cloud-based AI models with optional local fallback.
[](https://badge.fury.io/py/anotiai-pii-masker)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
## π Features
- **βοΈ Cloud-Powered**: Uses state-of-the-art AI models hosted on RunPod for maximum accuracy
- **β‘ Lightning Fast**: ~2-3 seconds inference time (after model warm-up)
- **π‘ Intelligent**: Combines multiple detection approaches (rule-based, ML, transformers)
- **π Reversible**: Mask and unmask PII while preserving data structure
- **π‘οΈ Privacy-First**: No data storage - all processing is ephemeral
- **π¦ Lightweight**: Minimal dependencies for cloud mode (~10MB vs ~5GB local)
- **π§ Flexible**: Support for both cloud and local inference modes
- **π― Simple API**: Just provide your user API key - no complex setup required
## π§ Installation
### Cloud Mode (Recommended)
```bash
pip install anotiai-pii-masker
```
### Local Mode (Full Dependencies)
```bash
pip install anotiai-pii-masker[local]
```
### Development
```bash
pip install anotiai-pii-masker[dev]
```
## π Quick Start
### Cloud Inference (Default)
```python
from anotiai_pii_masker import WhosePIIGuardian
# Simple setup - just provide your user API key
guardian = WhosePIIGuardian(
user_api_key="your_jwt_api_key"
)
# Mask PII in text
text = "Hi, I'm John Doe and my email is john.doe@company.com"
result = guardian.mask_text(text)
print(f"Original: {text}")
print(f"Masked: {result['masked_text']}")
# Output: "Hi, I'm [REDACTED_NAME_1] and my email is [REDACTED_EMAIL_1]"
# Unmask when needed
unmask_result = guardian.unmask_text(result['masked_text'], result['pii_map'])
print(f"Unmasked: {unmask_result['unmasked_text']}")
```
### Local Inference (Fallback)
```python
# Requires pip install anotiai-pii-masker[local]
guardian = WhosePIIGuardian(local_mode=True)
# Same API as cloud mode
result = guardian.mask_text(text)
print(f"Masked: {result['masked_text']}")
```
### Cloud with Local Fallback
```python
# Automatically falls back to local if cloud fails
guardian = WhosePIIGuardian(
user_api_key="your_jwt_api_key",
local_fallback=True
)
```
## π Getting API Credentials
1. **Get your JWT API key** from AnotiAI (contact: emmanuel@anotiai.com)
2. **No RunPod setup required** - credentials are handled automatically
3. **Simple usage** - just provide your user API key
```python
# That's it! No complex setup needed
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
```
## π Advanced Usage
### Detection Only
```python
# Get detected entities without masking
result = guardian.detect_pii(text)
print(f"Found {result['entities_found']} PII entities")
for token, entity in result['pii_map'].items():
print(f"- {entity['label']}: {entity['value']} (confidence: {entity['confidence']})")
```
### Confidence Thresholds
```python
# Adjust sensitivity (0.0 = very sensitive, 1.0 = very strict)
result = guardian.mask_text(text, confidence_threshold=0.8)
```
### Token Usage Tracking
```python
# All methods return detailed token usage information
result = guardian.mask_text(text)
print(f"Input tokens: {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")
print(f"Total tokens: {result['usage']['total_tokens']}")
# Usage tracking for unmasking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Unmasked tokens: {unmask_result['usage']['output_tokens']}")
```
### Error Handling
```python
from anotiai_pii_masker import WhosePIIGuardian, APIError
try:
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
result = guardian.mask_text(text)
except APIError as e:
print(f"API Error: {e}")
except Exception as e:
print(f"Error: {e}")
```
### Health Check
```python
# Check if the service is healthy
health = guardian.health_check()
print(f"Service status: {health['status']}")
```
## ποΈ Architecture
```
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Your App β β anotiai-pii- β β RunPod Cloud β
β βββββΆβ masker βββββΆβ β
β guardian.mask() β β (lightweight) β β GPU Models β
βββββββββββββββββββ ββββββββββββββββββββ β β’ DeBERTa β
β β’ RoBERTa β
β β’ Presidio β
β β’ spaCy β
βββββββββββββββββββ
```
## π§ Technical Implementation
### **Token Counting Algorithm**
The package uses a sophisticated token counting system:
```python
def count_tokens(data: Any) -> int:
"""
Calculates token count based on character length:
- Strings: Character count
- Dicts/Lists: JSON serialization length
- Other types: String representation length
"""
```
### **Usage Tracking Flow**
1. **Input Processing**: Counts original text characters
2. **Output Processing**: Counts masked text + PII map JSON
3. **Total Calculation**: Sums input and output tokens
4. **Billing Integration**: Returns structured usage data
### **API Response Format**
```python
{
"masked_text": "My name is [REDACTED_NAME_1]",
"pii_map": {"__TOKEN_1__": {...}},
"entities_found": 1,
"confidence_threshold": 0.5,
"usage": {
"input_tokens": 15, # Original text length
"output_tokens": 25, # Masked text + PII map JSON
"total_tokens": 40 # Total for billing
}
}
```
## π Supported PII Types
- **Personal**: Names, dates of birth, addresses
- **Contact**: Email addresses, phone numbers, URLs
- **Financial**: Credit card numbers, bank accounts
- **Government**: SSNs, passport numbers, license numbers
- **Healthcare**: Medical license numbers
- **Technical**: IP addresses, crypto addresses
## π Security & Privacy
- **No Data Storage**: All processing is ephemeral
- **Encrypted Transit**: HTTPS/TLS for all API communications
- **Reversible Masking**: Original data can be restored when needed
- **Configurable Thresholds**: Adjust sensitivity based on your needs
## π¨ Migration from v1.x
Version 2.0 introduces cloud-first architecture with simplified API. To migrate:
```python
# v1.x (local only)
from anotiai_pii_masker import WhosePIIGuardian
guardian = WhosePIIGuardian()
masked_text, pii_map = guardian.mask_text(text)
# v2.x (cloud-first with simplified API)
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
result = guardian.mask_text(text)
masked_text = result['masked_text']
pii_map = result['pii_map']
```
## π Performance
| Mode | Setup Time | Inference Time | Memory Usage | Accuracy |
|------|------------|----------------|--------------|----------|
| Cloud | ~1s | ~2-3s | ~50MB | 99.5% |
| Local | ~30s | ~5-10s | ~8GB | 99.5% |
## π Token Usage & Billing
The package provides comprehensive token usage tracking for accurate billing and monitoring:
### **Automatic Token Counting**
- **Input tokens**: Counted from original text
- **Output tokens**: Counted from masked text + PII map
- **Total tokens**: Sum of input and output tokens
- **JSON serialization**: PII maps are counted as JSON character length
### **Usage Examples**
```python
# Masking with token tracking
result = guardian.mask_text("My name is John Doe")
print(f"Input: {result['usage']['input_tokens']} tokens")
print(f"Output: {result['usage']['output_tokens']} tokens")
print(f"Total: {result['usage']['total_tokens']} tokens")
# Unmasking with token tracking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Restored: {unmask_result['usage']['output_tokens']} tokens")
```
### **Billing Integration**
```python
# Track usage across multiple operations
total_input_tokens = 0
total_output_tokens = 0
for text in texts:
result = guardian.mask_text(text)
total_input_tokens += result['usage']['input_tokens']
total_output_tokens += result['usage']['output_tokens']
print(f"Total processed: {total_input_tokens + total_output_tokens} tokens")
```
## π― Key Benefits
- **Simplified Setup**: Just provide your JWT API key - no complex RunPod configuration
- **Automatic Fallback**: Seamlessly switches to local mode if cloud is unavailable
- **Production Ready**: Battle-tested on RunPod Serverless infrastructure
- **Cost Effective**: Pay-per-use pricing with no idle costs
- **Enterprise Grade**: Built for scale with proper error handling and monitoring
- **Usage Tracking**: Comprehensive token counting for accurate billing
## π License
This project is licensed under the MIT License - see the LICENSE file for details.
## π Support
- **Documentation**: [GitHub README](https://github.com/anotiai/anotiai-pii-masker#readme)
- **Issues**: [GitHub Issues](https://github.com/anotiai/anotiai-pii-masker/issues)
- **Email**: emmanuel@anotiai.com
---
**Protect your users' privacy with AnotiAI PII Masker** π‘οΈ
Raw data
{
"_id": null,
"home_page": null,
"name": "anotiai-pii-masker",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pii, privacy, data-protection, nlp, machine-learning, cloud-ai",
"author": null,
"author_email": "AnotiAI <anotiai@anotiai.com>",
"download_url": "https://files.pythonhosted.org/packages/74/6e/40928735dcd126c2988c0d51073182fea9bca8979eeccd6e77cb0e329c14/anotiai_pii_masker-1.0.0.tar.gz",
"platform": null,
"description": "# AnotiAI PII Masker - Cloud-Powered Privacy Protection\n\nA lightweight Python package for detecting and masking personally identifiable information (PII) in text using cloud-based AI models with optional local fallback.\n\n[](https://badge.fury.io/py/anotiai-pii-masker)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n## \ud83d\ude80 Features\n\n- **\u2601\ufe0f Cloud-Powered**: Uses state-of-the-art AI models hosted on RunPod for maximum accuracy\n- **\u26a1 Lightning Fast**: ~2-3 seconds inference time (after model warm-up)\n- **\ud83d\udca1 Intelligent**: Combines multiple detection approaches (rule-based, ML, transformers)\n- **\ud83d\udd04 Reversible**: Mask and unmask PII while preserving data structure\n- **\ud83d\udee1\ufe0f Privacy-First**: No data storage - all processing is ephemeral\n- **\ud83d\udce6 Lightweight**: Minimal dependencies for cloud mode (~10MB vs ~5GB local)\n- **\ud83d\udd27 Flexible**: Support for both cloud and local inference modes\n- **\ud83c\udfaf Simple API**: Just provide your user API key - no complex setup required\n\n## \ud83d\udd27 Installation\n\n### Cloud Mode (Recommended)\n```bash\npip install anotiai-pii-masker\n```\n\n### Local Mode (Full Dependencies)\n```bash\npip install anotiai-pii-masker[local]\n```\n\n### Development\n```bash\npip install anotiai-pii-masker[dev]\n```\n\n## \ud83d\ude80 Quick Start\n\n### Cloud Inference (Default)\n\n```python\nfrom anotiai_pii_masker import WhosePIIGuardian\n\n# Simple setup - just provide your user API key\nguardian = WhosePIIGuardian(\n user_api_key=\"your_jwt_api_key\"\n)\n\n# Mask PII in text\ntext = \"Hi, I'm John Doe and my email is john.doe@company.com\"\nresult = guardian.mask_text(text)\n\nprint(f\"Original: {text}\")\nprint(f\"Masked: {result['masked_text']}\")\n# Output: \"Hi, I'm [REDACTED_NAME_1] and my email is [REDACTED_EMAIL_1]\"\n\n# Unmask when needed\nunmask_result = guardian.unmask_text(result['masked_text'], result['pii_map'])\nprint(f\"Unmasked: {unmask_result['unmasked_text']}\")\n```\n\n### Local Inference (Fallback)\n\n```python\n# Requires pip install anotiai-pii-masker[local]\nguardian = WhosePIIGuardian(local_mode=True)\n\n# Same API as cloud mode\nresult = guardian.mask_text(text)\nprint(f\"Masked: {result['masked_text']}\")\n```\n\n### Cloud with Local Fallback\n\n```python\n# Automatically falls back to local if cloud fails\nguardian = WhosePIIGuardian(\n user_api_key=\"your_jwt_api_key\",\n local_fallback=True\n)\n```\n\n## \ud83d\udd11 Getting API Credentials\n\n1. **Get your JWT API key** from AnotiAI (contact: emmanuel@anotiai.com)\n2. **No RunPod setup required** - credentials are handled automatically\n3. **Simple usage** - just provide your user API key\n\n```python\n# That's it! No complex setup needed\nguardian = WhosePIIGuardian(user_api_key=\"your_jwt_api_key\")\n```\n\n## \ud83d\udcd6 Advanced Usage\n\n### Detection Only\n```python\n# Get detected entities without masking\nresult = guardian.detect_pii(text)\nprint(f\"Found {result['entities_found']} PII entities\")\n\nfor token, entity in result['pii_map'].items():\n print(f\"- {entity['label']}: {entity['value']} (confidence: {entity['confidence']})\")\n```\n\n### Confidence Thresholds\n```python\n# Adjust sensitivity (0.0 = very sensitive, 1.0 = very strict)\nresult = guardian.mask_text(text, confidence_threshold=0.8)\n```\n\n### Token Usage Tracking\n```python\n# All methods return detailed token usage information\nresult = guardian.mask_text(text)\nprint(f\"Input tokens: {result['usage']['input_tokens']}\")\nprint(f\"Output tokens: {result['usage']['output_tokens']}\")\nprint(f\"Total tokens: {result['usage']['total_tokens']}\")\n\n# Usage tracking for unmasking\nunmask_result = guardian.unmask_text(masked_text, pii_map)\nprint(f\"Unmasked tokens: {unmask_result['usage']['output_tokens']}\")\n```\n\n### Error Handling\n```python\nfrom anotiai_pii_masker import WhosePIIGuardian, APIError\n\ntry:\n guardian = WhosePIIGuardian(user_api_key=\"your_jwt_api_key\")\n result = guardian.mask_text(text)\nexcept APIError as e:\n print(f\"API Error: {e}\")\nexcept Exception as e:\n print(f\"Error: {e}\")\n```\n\n### Health Check\n```python\n# Check if the service is healthy\nhealth = guardian.health_check()\nprint(f\"Service status: {health['status']}\")\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Your App \u2502 \u2502 anotiai-pii- \u2502 \u2502 RunPod Cloud \u2502\n\u2502 \u2502\u2500\u2500\u2500\u25b6\u2502 masker \u2502\u2500\u2500\u2500\u25b6\u2502 \u2502\n\u2502 guardian.mask() \u2502 \u2502 (lightweight) \u2502 \u2502 GPU Models \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502 \u2022 DeBERTa \u2502\n \u2502 \u2022 RoBERTa \u2502\n \u2502 \u2022 Presidio \u2502\n \u2502 \u2022 spaCy \u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n## \ud83d\udd27 Technical Implementation\n\n### **Token Counting Algorithm**\nThe package uses a sophisticated token counting system:\n\n```python\ndef count_tokens(data: Any) -> int:\n \"\"\"\n Calculates token count based on character length:\n - Strings: Character count\n - Dicts/Lists: JSON serialization length\n - Other types: String representation length\n \"\"\"\n```\n\n### **Usage Tracking Flow**\n1. **Input Processing**: Counts original text characters\n2. **Output Processing**: Counts masked text + PII map JSON\n3. **Total Calculation**: Sums input and output tokens\n4. **Billing Integration**: Returns structured usage data\n\n### **API Response Format**\n```python\n{\n \"masked_text\": \"My name is [REDACTED_NAME_1]\",\n \"pii_map\": {\"__TOKEN_1__\": {...}},\n \"entities_found\": 1,\n \"confidence_threshold\": 0.5,\n \"usage\": {\n \"input_tokens\": 15, # Original text length\n \"output_tokens\": 25, # Masked text + PII map JSON\n \"total_tokens\": 40 # Total for billing\n }\n}\n```\n\n## \ud83d\udcca Supported PII Types\n\n- **Personal**: Names, dates of birth, addresses\n- **Contact**: Email addresses, phone numbers, URLs\n- **Financial**: Credit card numbers, bank accounts\n- **Government**: SSNs, passport numbers, license numbers\n- **Healthcare**: Medical license numbers\n- **Technical**: IP addresses, crypto addresses\n\n## \ud83d\udd12 Security & Privacy\n\n- **No Data Storage**: All processing is ephemeral\n- **Encrypted Transit**: HTTPS/TLS for all API communications\n- **Reversible Masking**: Original data can be restored when needed\n- **Configurable Thresholds**: Adjust sensitivity based on your needs\n\n## \ud83d\udea8 Migration from v1.x\n\nVersion 2.0 introduces cloud-first architecture with simplified API. To migrate:\n\n```python\n# v1.x (local only)\nfrom anotiai_pii_masker import WhosePIIGuardian\nguardian = WhosePIIGuardian()\nmasked_text, pii_map = guardian.mask_text(text)\n\n# v2.x (cloud-first with simplified API)\nguardian = WhosePIIGuardian(user_api_key=\"your_jwt_api_key\")\nresult = guardian.mask_text(text)\nmasked_text = result['masked_text']\npii_map = result['pii_map']\n```\n\n## \ud83d\udcc8 Performance\n\n| Mode | Setup Time | Inference Time | Memory Usage | Accuracy |\n|------|------------|----------------|--------------|----------|\n| Cloud | ~1s | ~2-3s | ~50MB | 99.5% |\n| Local | ~30s | ~5-10s | ~8GB | 99.5% |\n\n## \ud83d\udcca Token Usage & Billing\n\nThe package provides comprehensive token usage tracking for accurate billing and monitoring:\n\n### **Automatic Token Counting**\n- **Input tokens**: Counted from original text\n- **Output tokens**: Counted from masked text + PII map\n- **Total tokens**: Sum of input and output tokens\n- **JSON serialization**: PII maps are counted as JSON character length\n\n### **Usage Examples**\n```python\n# Masking with token tracking\nresult = guardian.mask_text(\"My name is John Doe\")\nprint(f\"Input: {result['usage']['input_tokens']} tokens\")\nprint(f\"Output: {result['usage']['output_tokens']} tokens\") \nprint(f\"Total: {result['usage']['total_tokens']} tokens\")\n\n# Unmasking with token tracking\nunmask_result = guardian.unmask_text(masked_text, pii_map)\nprint(f\"Restored: {unmask_result['usage']['output_tokens']} tokens\")\n```\n\n### **Billing Integration**\n```python\n# Track usage across multiple operations\ntotal_input_tokens = 0\ntotal_output_tokens = 0\n\nfor text in texts:\n result = guardian.mask_text(text)\n total_input_tokens += result['usage']['input_tokens']\n total_output_tokens += result['usage']['output_tokens']\n\nprint(f\"Total processed: {total_input_tokens + total_output_tokens} tokens\")\n```\n\n## \ud83c\udfaf Key Benefits\n\n- **Simplified Setup**: Just provide your JWT API key - no complex RunPod configuration\n- **Automatic Fallback**: Seamlessly switches to local mode if cloud is unavailable\n- **Production Ready**: Battle-tested on RunPod Serverless infrastructure\n- **Cost Effective**: Pay-per-use pricing with no idle costs\n- **Enterprise Grade**: Built for scale with proper error handling and monitoring\n- **Usage Tracking**: Comprehensive token counting for accurate billing\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## \ud83c\udd98 Support\n\n- **Documentation**: [GitHub README](https://github.com/anotiai/anotiai-pii-masker#readme)\n- **Issues**: [GitHub Issues](https://github.com/anotiai/anotiai-pii-masker/issues)\n- **Email**: emmanuel@anotiai.com\n\n---\n\n**Protect your users' privacy with AnotiAI PII Masker** \ud83d\udee1\ufe0f\n",
"bugtrack_url": null,
"license": null,
"summary": "Cloud-powered PII detection and masking with local fallback support.",
"version": "1.0.0",
"project_urls": {
"Documentation": "https://github.com/anotiai/anotiai-pii-masker#readme",
"Homepage": "https://github.com/anotiai/anotiai-pii-masker",
"Issues": "https://github.com/anotiai/anotiai-pii-masker/issues",
"Repository": "https://github.com/anotiai/anotiai-pii-masker"
},
"split_keywords": [
"pii",
" privacy",
" data-protection",
" nlp",
" machine-learning",
" cloud-ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8524d3aa8b7074f6fc9e72a70c55c8d2a004d99bc3a0f6fffe7839876927c6f7",
"md5": "9f5d2e297b3930a38e3d3f2c8b678d4c",
"sha256": "70fa35dabd7bf193eae64d6d597801297b1a9eed0f19ff78b47e54fb134e2af3"
},
"downloads": -1,
"filename": "anotiai_pii_masker-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9f5d2e297b3930a38e3d3f2c8b678d4c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 52477,
"upload_time": "2025-10-06T23:33:43",
"upload_time_iso_8601": "2025-10-06T23:33:43.850018Z",
"url": "https://files.pythonhosted.org/packages/85/24/d3aa8b7074f6fc9e72a70c55c8d2a004d99bc3a0f6fffe7839876927c6f7/anotiai_pii_masker-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "746e40928735dcd126c2988c0d51073182fea9bca8979eeccd6e77cb0e329c14",
"md5": "76a6ac91011fb8019f5c26073312e2c3",
"sha256": "65a9abf8798a718e234fd369843f37054b3043ca69d411e286ad81a64297b83d"
},
"downloads": -1,
"filename": "anotiai_pii_masker-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "76a6ac91011fb8019f5c26073312e2c3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 40785,
"upload_time": "2025-10-06T23:33:48",
"upload_time_iso_8601": "2025-10-06T23:33:48.691854Z",
"url": "https://files.pythonhosted.org/packages/74/6e/40928735dcd126c2988c0d51073182fea9bca8979eeccd6e77cb0e329c14/anotiai_pii_masker-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-06 23:33:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "anotiai",
"github_project": "anotiai-pii-masker#readme",
"github_not_found": true,
"lcname": "anotiai-pii-masker"
}