<div align="center">
<img src="logo.png" alt="CloudMask Logo" width="200"/>
# CloudMask-AWS 🎭
**Anonymize AWS infrastructure identifiers for secure LLM processing**
[](https://badge.fury.io/py/cloudmask-aws)
[](https://pypi.org/project/cloudmask-aws/)
[](https://opensource.org/licenses/MIT)
</div>
CloudMask helps you safely share AWS infrastructure data with Large Language Models by anonymizing sensitive identifiers while maintaining structure and reversibility.
## Features
- 🔒 **Secure Anonymization**: Hash-based deterministic anonymization
- 🔄 **Reversible**: Complete mapping for unanonymization
- 🏗️ **Structure-Preserving**: Maintains AWS resource ID prefixes (vpc-, i-, etc.)
- ⚙️ **Configurable**: YAML-based configuration for company names and custom patterns
- 🐍 **Dual Interface**: Use as CLI tool or Python library
- 📋 **Clipboard Support**: Direct clipboard anonymization for quick workflows
- 📦 **Modern Python**: Built with Python 3.10+ features (pattern matching, union types)
- 🚀 **Minimal Dependencies**: Only requires PyYAML and pyperclip
## Requirements
- Python 3.10 or higher
- PyYAML 6.0+
CloudMask leverages modern Python features including structural pattern matching, union type operators, and built-in generic types.
## Installation
```bash
pip install cloudmask-aws
```
## Quick Start
### CLI Usage
```bash
# Generate config file
cloudmask init-config
# Anonymize (mapping stored in ~/.cloudmask/mapping.json by default)
cloudmask anonymize -i infrastructure.txt -o anonymized.txt
# Anonymize clipboard content
cloudmask anonymize --clipboard
# Restore original values
cloudmask unanonymize -i llm-response.txt -o restored.txt
# Restore clipboard content
cloudmask unanonymize --clipboard
# Use custom mapping location if needed
cloudmask anonymize --clipboard -m /path/to/custom-mapping.json
```
### Python Library Usage
```python
from cloudmask import CloudMask, CloudUnmask, Storage
# Anonymize text
mask = CloudMask(seed="my-secret-seed")
anonymized = mask.anonymize("""
Instance i-1234567890abcdef0 is running in vpc-abcdef123456
Account: 123456789012
Company: Acme Corp
""")
# Save mapping to secure central location (~/.cloudmask/mapping.json)
mask.save_mapping(Storage.DefaultMappingPath)
# Unanonymize later
unmask = CloudUnmask(mapping_file=Storage.DefaultMappingPath)
original = unmask.unanonymize(anonymized)
```
### Quick Function Usage
```python
from cloudmask import anonymize, unanonymize
# One-liner anonymization
text, mapping = anonymize(
"Instance i-123 in account 123456789012",
seed="my-seed",
company_names=["Acme Corp"]
)
# Restore original
original = unanonymize(text, mapping)
```
## Configuration
Create a `~/.cloudmask/config.yml` file (or use `cloudmask init-config`):
```yaml
company_names:
- Acme Corp
- Example Inc
- MyCompany LLC
custom_patterns:
- pattern: '\bTICKET-\d{4,6}\b'
name: ticket
- pattern: '\bPROJ-[A-Z0-9]+'
name: project
# Example: Match company domains
- pattern: '[a-zA-Z0-9][-a-zA-Z0-9.]*\.acme\.local\b'
name: domain
# Example: Match company emails
- pattern: '[a-zA-Z0-9._%+-]+@acme-corp\.local\b'
name: email
preserve_prefixes: true
anonymize_ips: true
anonymize_domains: false
seed: my-secret-seed
```
### Custom Pattern Edge Cases
When creating custom patterns, be aware of:
- **Case sensitivity**: Patterns are case-insensitive by default
- **Multi-level subdomains**: Use `[-a-zA-Z0-9.]*` to match `app.corp.acme.local`
- **HTML entities**: `"` in data is handled automatically
- **Email variations**: Pattern supports `user+tag@domain` and `user.name@domain`
- **URLs and paths**: Patterns work in `https://server.acme.local/path`
- **Trailing punctuation**: Word boundaries (`\b`) handle `server.acme.local.`
Test your patterns:
```bash
cloudmask anonymize -i test.txt -o output.txt
grep -i "sensitive-term" output.txt # Should return nothing
```
## What Gets Anonymized?
- ✅ AWS Resource IDs (vpc-, i-, sg-, ami-, etc.)
- ✅ AWS Account IDs (12-digit numbers)
- ✅ AWS ARNs
- ✅ IP Addresses
- ✅ Company names (from config)
- ✅ Custom patterns (via regex)
- ✅ Domain names (optional)
## Use Cases
- 🤖 **LLM Assistance**: Get help with infrastructure without exposing sensitive IDs
- 📊 **Data Sharing**: Share infrastructure diagrams and configs safely
- 🔍 **Security Analysis**: Analyze configs with external tools
- 📝 **Documentation**: Create shareable examples from real infrastructure
## Advanced Usage
### Using in Scripts
```python
from cloudmask import CloudMask, Config
from pathlib import Path
# Load custom config
config = Config.from_yaml(Path("custom-config.yaml"))
# Create anonymizer
mask = CloudMask(config=config, seed="production-seed")
# Process multiple files
for file in Path("configs").glob("*.yaml"):
output = Path("anonymized") / file.name
mask.anonymize_file(file, output)
# Save single mapping for all files
mask.save_mapping("master-mapping.json")
```
### Context Manager
```python
from cloudmask import TemporaryMask
with TemporaryMask(seed="temp-seed") as mask:
anonymized = mask.anonymize("vpc-123 i-456")
# Process anonymized data
# Mapping is discarded after context exits
```
### Batch Processing
```python
from cloudmask import create_batch_anonymizer
# Create reusable anonymizer
anon = create_batch_anonymizer(seed="batch-seed")
# Use it multiple times
result1 = anon("vpc-123")
result2 = anon("i-456")
```
## Central Storage
CloudMask stores mapping files in a secure central location by default:
```
~/.cloudmask/mapping.json
```
### Benefits
- 🔐 **Automatic Security**: Files created with `600` permissions (owner read/write only)
- 📍 **Centralized**: All mappings in one location
- 🔄 **Auto-Merge**: New mappings are merged with existing ones (same seed required)
- ⚡ **Convenient**: No need to specify `-m` flag for common workflows
### Seed Verification
CloudMask tracks which seed created each mapping file. Mappings can only be merged if they use the same seed:
```python
# First use
mask1 = CloudMask(seed="prod-seed")
mask1.anonymize("vpc-11111")
mask1.save_mapping(Storage.DefaultMappingPath) # Creates mapping
# Later use with SAME seed - mappings merge automatically
mask2 = CloudMask(seed="prod-seed")
mask2.anonymize("vpc-22222")
mask2.save_mapping(Storage.DefaultMappingPath) # Merges with existing
# Different seed - prevents accidental corruption
mask3 = CloudMask(seed="different-seed")
mask3.save_mapping(Storage.DefaultMappingPath) # ERROR: Seed mismatch
```
### Custom Locations
You can still use custom mapping locations:
```bash
cloudmask anonymize -i file.txt -m /custom/path/mapping.json
```
```python
mask.save_mapping("/custom/path/mapping.json")
```
## Security Notes
⚠️ **Keep your mapping files secure!** They contain the reversible mappings.
- The `~/.cloudmask/` directory is automatically secured with `700` permissions
- Mapping files are created with `600` permissions (owner only)
- Store mapping files separately from anonymized data
- Use strong, unique seeds for different projects
- Don't commit mapping files to version control
- Consider encrypting mapping files for sensitive data
- Always use the same seed for a project to enable mapping merges
## Contributing
Contributions welcome! Please check out the [GitHub repository](https://github.com/sam-fakhreddine/cloudmask-aws).
## License
MIT License - see LICENSE file for details
## Troubleshooting
### Pattern Not Matching?
1. Test the regex: `python -c "import re; print(re.search(r'your-pattern', 'test-string'))"`
2. Check case sensitivity: Patterns are case-insensitive
3. Verify word boundaries: `\b` may not work with special characters
4. Test with real data: Use small samples first
### Data Still Leaking?
1. Check pattern order: Custom patterns run before company names
2. Verify pattern name: Use generic names like `domain`, not `acme-domain`
3. Test thoroughly: `grep -i "sensitive" output.txt`
## Support
- 📖 [Documentation](https://github.com/sam-fakhreddine/cloudmask-aws#readme)
- 🐛 [Issue Tracker](https://github.com/sam-fakhreddine/cloudmask-aws/issues)
- 💬 [Discussions](https://github.com/sam-fakhreddine/cloudmask-aws/discussions)
Raw data
{
"_id": null,
"home_page": null,
"name": "cloudmask-aws",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Sam Fakhreddine <sam.fakhreddine@gmail.com>",
"keywords": "aws, anonymization, security, privacy, llm, infrastructure, cloud, data-privacy, pii, masking, obfuscation",
"author": null,
"author_email": "Sam Fakhreddine <sam.fakhreddine@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5a/bc/5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650/cloudmask_aws-0.4.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n<img src=\"logo.png\" alt=\"CloudMask Logo\" width=\"200\"/>\n\n# CloudMask-AWS \ud83c\udfad\n\n**Anonymize AWS infrastructure identifiers for secure LLM processing**\n\n[](https://badge.fury.io/py/cloudmask-aws)\n[](https://pypi.org/project/cloudmask-aws/)\n[](https://opensource.org/licenses/MIT)\n\n</div>\n\nCloudMask helps you safely share AWS infrastructure data with Large Language Models by anonymizing sensitive identifiers while maintaining structure and reversibility.\n\n## Features\n\n- \ud83d\udd12 **Secure Anonymization**: Hash-based deterministic anonymization\n- \ud83d\udd04 **Reversible**: Complete mapping for unanonymization\n- \ud83c\udfd7\ufe0f **Structure-Preserving**: Maintains AWS resource ID prefixes (vpc-, i-, etc.)\n- \u2699\ufe0f **Configurable**: YAML-based configuration for company names and custom patterns\n- \ud83d\udc0d **Dual Interface**: Use as CLI tool or Python library\n- \ud83d\udccb **Clipboard Support**: Direct clipboard anonymization for quick workflows\n- \ud83d\udce6 **Modern Python**: Built with Python 3.10+ features (pattern matching, union types)\n- \ud83d\ude80 **Minimal Dependencies**: Only requires PyYAML and pyperclip\n\n## Requirements\n\n- Python 3.10 or higher\n- PyYAML 6.0+\n\nCloudMask leverages modern Python features including structural pattern matching, union type operators, and built-in generic types.\n\n## Installation\n\n```bash\npip install cloudmask-aws\n```\n\n## Quick Start\n\n### CLI Usage\n\n```bash\n# Generate config file\ncloudmask init-config\n\n# Anonymize (mapping stored in ~/.cloudmask/mapping.json by default)\ncloudmask anonymize -i infrastructure.txt -o anonymized.txt\n\n# Anonymize clipboard content\ncloudmask anonymize --clipboard\n\n# Restore original values\ncloudmask unanonymize -i llm-response.txt -o restored.txt\n\n# Restore clipboard content\ncloudmask unanonymize --clipboard\n\n# Use custom mapping location if needed\ncloudmask anonymize --clipboard -m /path/to/custom-mapping.json\n```\n\n### Python Library Usage\n\n```python\nfrom cloudmask import CloudMask, CloudUnmask, Storage\n\n# Anonymize text\nmask = CloudMask(seed=\"my-secret-seed\")\nanonymized = mask.anonymize(\"\"\"\n Instance i-1234567890abcdef0 is running in vpc-abcdef123456\n Account: 123456789012\n Company: Acme Corp\n\"\"\")\n\n# Save mapping to secure central location (~/.cloudmask/mapping.json)\nmask.save_mapping(Storage.DefaultMappingPath)\n\n# Unanonymize later\nunmask = CloudUnmask(mapping_file=Storage.DefaultMappingPath)\noriginal = unmask.unanonymize(anonymized)\n```\n\n### Quick Function Usage\n\n```python\nfrom cloudmask import anonymize, unanonymize\n\n# One-liner anonymization\ntext, mapping = anonymize(\n \"Instance i-123 in account 123456789012\",\n seed=\"my-seed\",\n company_names=[\"Acme Corp\"]\n)\n\n# Restore original\noriginal = unanonymize(text, mapping)\n```\n\n## Configuration\n\nCreate a `~/.cloudmask/config.yml` file (or use `cloudmask init-config`):\n\n```yaml\ncompany_names:\n - Acme Corp\n - Example Inc\n - MyCompany LLC\n\ncustom_patterns:\n - pattern: '\\bTICKET-\\d{4,6}\\b'\n name: ticket\n - pattern: '\\bPROJ-[A-Z0-9]+'\n name: project\n # Example: Match company domains\n - pattern: '[a-zA-Z0-9][-a-zA-Z0-9.]*\\.acme\\.local\\b'\n name: domain\n # Example: Match company emails\n - pattern: '[a-zA-Z0-9._%+-]+@acme-corp\\.local\\b'\n name: email\n\npreserve_prefixes: true\nanonymize_ips: true\nanonymize_domains: false\nseed: my-secret-seed\n```\n\n### Custom Pattern Edge Cases\n\nWhen creating custom patterns, be aware of:\n\n- **Case sensitivity**: Patterns are case-insensitive by default\n- **Multi-level subdomains**: Use `[-a-zA-Z0-9.]*` to match `app.corp.acme.local`\n- **HTML entities**: `"` in data is handled automatically\n- **Email variations**: Pattern supports `user+tag@domain` and `user.name@domain`\n- **URLs and paths**: Patterns work in `https://server.acme.local/path`\n- **Trailing punctuation**: Word boundaries (`\\b`) handle `server.acme.local.`\n\nTest your patterns:\n```bash\ncloudmask anonymize -i test.txt -o output.txt\ngrep -i \"sensitive-term\" output.txt # Should return nothing\n```\n\n## What Gets Anonymized?\n\n- \u2705 AWS Resource IDs (vpc-, i-, sg-, ami-, etc.)\n- \u2705 AWS Account IDs (12-digit numbers)\n- \u2705 AWS ARNs\n- \u2705 IP Addresses\n- \u2705 Company names (from config)\n- \u2705 Custom patterns (via regex)\n- \u2705 Domain names (optional)\n\n## Use Cases\n\n- \ud83e\udd16 **LLM Assistance**: Get help with infrastructure without exposing sensitive IDs\n- \ud83d\udcca **Data Sharing**: Share infrastructure diagrams and configs safely\n- \ud83d\udd0d **Security Analysis**: Analyze configs with external tools\n- \ud83d\udcdd **Documentation**: Create shareable examples from real infrastructure\n\n## Advanced Usage\n\n### Using in Scripts\n\n```python\nfrom cloudmask import CloudMask, Config\nfrom pathlib import Path\n\n# Load custom config\nconfig = Config.from_yaml(Path(\"custom-config.yaml\"))\n\n# Create anonymizer\nmask = CloudMask(config=config, seed=\"production-seed\")\n\n# Process multiple files\nfor file in Path(\"configs\").glob(\"*.yaml\"):\n output = Path(\"anonymized\") / file.name\n mask.anonymize_file(file, output)\n\n# Save single mapping for all files\nmask.save_mapping(\"master-mapping.json\")\n```\n\n### Context Manager\n\n```python\nfrom cloudmask import TemporaryMask\n\nwith TemporaryMask(seed=\"temp-seed\") as mask:\n anonymized = mask.anonymize(\"vpc-123 i-456\")\n # Process anonymized data\n # Mapping is discarded after context exits\n```\n\n### Batch Processing\n\n```python\nfrom cloudmask import create_batch_anonymizer\n\n# Create reusable anonymizer\nanon = create_batch_anonymizer(seed=\"batch-seed\")\n\n# Use it multiple times\nresult1 = anon(\"vpc-123\")\nresult2 = anon(\"i-456\")\n```\n\n## Central Storage\n\nCloudMask stores mapping files in a secure central location by default:\n\n```\n~/.cloudmask/mapping.json\n```\n\n### Benefits\n\n- \ud83d\udd10 **Automatic Security**: Files created with `600` permissions (owner read/write only)\n- \ud83d\udccd **Centralized**: All mappings in one location\n- \ud83d\udd04 **Auto-Merge**: New mappings are merged with existing ones (same seed required)\n- \u26a1 **Convenient**: No need to specify `-m` flag for common workflows\n\n### Seed Verification\n\nCloudMask tracks which seed created each mapping file. Mappings can only be merged if they use the same seed:\n\n```python\n# First use\nmask1 = CloudMask(seed=\"prod-seed\")\nmask1.anonymize(\"vpc-11111\")\nmask1.save_mapping(Storage.DefaultMappingPath) # Creates mapping\n\n# Later use with SAME seed - mappings merge automatically\nmask2 = CloudMask(seed=\"prod-seed\")\nmask2.anonymize(\"vpc-22222\")\nmask2.save_mapping(Storage.DefaultMappingPath) # Merges with existing\n\n# Different seed - prevents accidental corruption\nmask3 = CloudMask(seed=\"different-seed\")\nmask3.save_mapping(Storage.DefaultMappingPath) # ERROR: Seed mismatch\n```\n\n### Custom Locations\n\nYou can still use custom mapping locations:\n\n```bash\ncloudmask anonymize -i file.txt -m /custom/path/mapping.json\n```\n\n```python\nmask.save_mapping(\"/custom/path/mapping.json\")\n```\n\n## Security Notes\n\n\u26a0\ufe0f **Keep your mapping files secure!** They contain the reversible mappings.\n\n- The `~/.cloudmask/` directory is automatically secured with `700` permissions\n- Mapping files are created with `600` permissions (owner only)\n- Store mapping files separately from anonymized data\n- Use strong, unique seeds for different projects\n- Don't commit mapping files to version control\n- Consider encrypting mapping files for sensitive data\n- Always use the same seed for a project to enable mapping merges\n\n## Contributing\n\nContributions welcome! Please check out the [GitHub repository](https://github.com/sam-fakhreddine/cloudmask-aws).\n\n## License\n\nMIT License - see LICENSE file for details\n\n## Troubleshooting\n\n### Pattern Not Matching?\n\n1. Test the regex: `python -c \"import re; print(re.search(r'your-pattern', 'test-string'))\"`\n2. Check case sensitivity: Patterns are case-insensitive\n3. Verify word boundaries: `\\b` may not work with special characters\n4. Test with real data: Use small samples first\n\n### Data Still Leaking?\n\n1. Check pattern order: Custom patterns run before company names\n2. Verify pattern name: Use generic names like `domain`, not `acme-domain`\n3. Test thoroughly: `grep -i \"sensitive\" output.txt`\n\n## Support\n\n- \ud83d\udcd6 [Documentation](https://github.com/sam-fakhreddine/cloudmask-aws#readme)\n- \ud83d\udc1b [Issue Tracker](https://github.com/sam-fakhreddine/cloudmask-aws/issues)\n- \ud83d\udcac [Discussions](https://github.com/sam-fakhreddine/cloudmask-aws/discussions)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Anonymize AWS infrastructure identifiers for secure LLM processing",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/sam-fakhreddine/cloudmask-aws/issues",
"Changelog": "https://github.com/sam-fakhreddine/cloudmask-aws/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/sam-fakhreddine/cloudmask-aws#readme",
"Homepage": "https://github.com/sam-fakhreddine/cloudmask-aws",
"Repository": "https://github.com/sam-fakhreddine/cloudmask-aws",
"Source Code": "https://github.com/sam-fakhreddine/cloudmask-aws"
},
"split_keywords": [
"aws",
" anonymization",
" security",
" privacy",
" llm",
" infrastructure",
" cloud",
" data-privacy",
" pii",
" masking",
" obfuscation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "96536a6b8bfce9735294bdf06d981cc9c4e1aa2da2fa17d7091d43200bfb677e",
"md5": "309fe2f02717ac618407cacf33633cbc",
"sha256": "a545fe37efc606aa3879bca4c0e28ffb0b7bd967b557aa52580cf1a155ad1a16"
},
"downloads": -1,
"filename": "cloudmask_aws-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "309fe2f02717ac618407cacf33633cbc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 31538,
"upload_time": "2025-10-14T01:50:19",
"upload_time_iso_8601": "2025-10-14T01:50:19.673608Z",
"url": "https://files.pythonhosted.org/packages/96/53/6a6b8bfce9735294bdf06d981cc9c4e1aa2da2fa17d7091d43200bfb677e/cloudmask_aws-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5abc5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650",
"md5": "ae9db6c1c9035f647c646a8c1e0ca51a",
"sha256": "72d3b2bc61f13117d2469ab4f8a29baf611db7e898095d69e6f44ca639cd677f"
},
"downloads": -1,
"filename": "cloudmask_aws-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "ae9db6c1c9035f647c646a8c1e0ca51a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 448059,
"upload_time": "2025-10-14T01:50:20",
"upload_time_iso_8601": "2025-10-14T01:50:20.888176Z",
"url": "https://files.pythonhosted.org/packages/5a/bc/5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650/cloudmask_aws-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-14 01:50:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sam-fakhreddine",
"github_project": "cloudmask-aws",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cloudmask-aws"
}