cloudmask-aws


Namecloudmask-aws JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryAnonymize AWS infrastructure identifiers for secure LLM processing
upload_time2025-10-14 01:50:20
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords aws anonymization security privacy llm infrastructure cloud data-privacy pii masking obfuscation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

<img src="logo.png" alt="CloudMask Logo" width="200"/>

# CloudMask-AWS 🎭

**Anonymize AWS infrastructure identifiers for secure LLM processing**

[![PyPI version](https://badge.fury.io/py/cloudmask-aws.svg)](https://badge.fury.io/py/cloudmask-aws)
[![Python Versions](https://img.shields.io/pypi/pyversions/cloudmask-aws.svg)](https://pypi.org/project/cloudmask-aws/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

</div>

CloudMask helps you safely share AWS infrastructure data with Large Language Models by anonymizing sensitive identifiers while maintaining structure and reversibility.

## Features

- 🔒 **Secure Anonymization**: Hash-based deterministic anonymization
- 🔄 **Reversible**: Complete mapping for unanonymization
- 🏗️ **Structure-Preserving**: Maintains AWS resource ID prefixes (vpc-, i-, etc.)
- ⚙️ **Configurable**: YAML-based configuration for company names and custom patterns
- 🐍 **Dual Interface**: Use as CLI tool or Python library
- 📋 **Clipboard Support**: Direct clipboard anonymization for quick workflows
- 📦 **Modern Python**: Built with Python 3.10+ features (pattern matching, union types)
- 🚀 **Minimal Dependencies**: Only requires PyYAML and pyperclip

## Requirements

- Python 3.10 or higher
- PyYAML 6.0+

CloudMask leverages modern Python features including structural pattern matching, union type operators, and built-in generic types.

## Installation

```bash
pip install cloudmask-aws
```

## Quick Start

### CLI Usage

```bash
# Generate config file
cloudmask init-config

# Anonymize (mapping stored in ~/.cloudmask/mapping.json by default)
cloudmask anonymize -i infrastructure.txt -o anonymized.txt

# Anonymize clipboard content
cloudmask anonymize --clipboard

# Restore original values
cloudmask unanonymize -i llm-response.txt -o restored.txt

# Restore clipboard content
cloudmask unanonymize --clipboard

# Use custom mapping location if needed
cloudmask anonymize --clipboard -m /path/to/custom-mapping.json
```

### Python Library Usage

```python
from cloudmask import CloudMask, CloudUnmask, Storage

# Anonymize text
mask = CloudMask(seed="my-secret-seed")
anonymized = mask.anonymize("""
    Instance i-1234567890abcdef0 is running in vpc-abcdef123456
    Account: 123456789012
    Company: Acme Corp
""")

# Save mapping to secure central location (~/.cloudmask/mapping.json)
mask.save_mapping(Storage.DefaultMappingPath)

# Unanonymize later
unmask = CloudUnmask(mapping_file=Storage.DefaultMappingPath)
original = unmask.unanonymize(anonymized)
```

### Quick Function Usage

```python
from cloudmask import anonymize, unanonymize

# One-liner anonymization
text, mapping = anonymize(
    "Instance i-123 in account 123456789012",
    seed="my-seed",
    company_names=["Acme Corp"]
)

# Restore original
original = unanonymize(text, mapping)
```

## Configuration

Create a `~/.cloudmask/config.yml` file (or use `cloudmask init-config`):

```yaml
company_names:
  - Acme Corp
  - Example Inc
  - MyCompany LLC

custom_patterns:
  - pattern: '\bTICKET-\d{4,6}\b'
    name: ticket
  - pattern: '\bPROJ-[A-Z0-9]+'
    name: project
  # Example: Match company domains
  - pattern: '[a-zA-Z0-9][-a-zA-Z0-9.]*\.acme\.local\b'
    name: domain
  # Example: Match company emails
  - pattern: '[a-zA-Z0-9._%+-]+@acme-corp\.local\b'
    name: email

preserve_prefixes: true
anonymize_ips: true
anonymize_domains: false
seed: my-secret-seed
```

### Custom Pattern Edge Cases

When creating custom patterns, be aware of:

- **Case sensitivity**: Patterns are case-insensitive by default
- **Multi-level subdomains**: Use `[-a-zA-Z0-9.]*` to match `app.corp.acme.local`
- **HTML entities**: `&quot;` in data is handled automatically
- **Email variations**: Pattern supports `user+tag@domain` and `user.name@domain`
- **URLs and paths**: Patterns work in `https://server.acme.local/path`
- **Trailing punctuation**: Word boundaries (`\b`) handle `server.acme.local.`

Test your patterns:
```bash
cloudmask anonymize -i test.txt -o output.txt
grep -i "sensitive-term" output.txt  # Should return nothing
```

## What Gets Anonymized?

- ✅ AWS Resource IDs (vpc-, i-, sg-, ami-, etc.)
- ✅ AWS Account IDs (12-digit numbers)
- ✅ AWS ARNs
- ✅ IP Addresses
- ✅ Company names (from config)
- ✅ Custom patterns (via regex)
- ✅ Domain names (optional)

## Use Cases

- 🤖 **LLM Assistance**: Get help with infrastructure without exposing sensitive IDs
- 📊 **Data Sharing**: Share infrastructure diagrams and configs safely
- 🔍 **Security Analysis**: Analyze configs with external tools
- 📝 **Documentation**: Create shareable examples from real infrastructure

## Advanced Usage

### Using in Scripts

```python
from cloudmask import CloudMask, Config
from pathlib import Path

# Load custom config
config = Config.from_yaml(Path("custom-config.yaml"))

# Create anonymizer
mask = CloudMask(config=config, seed="production-seed")

# Process multiple files
for file in Path("configs").glob("*.yaml"):
    output = Path("anonymized") / file.name
    mask.anonymize_file(file, output)

# Save single mapping for all files
mask.save_mapping("master-mapping.json")
```

### Context Manager

```python
from cloudmask import TemporaryMask

with TemporaryMask(seed="temp-seed") as mask:
    anonymized = mask.anonymize("vpc-123 i-456")
    # Process anonymized data
    # Mapping is discarded after context exits
```

### Batch Processing

```python
from cloudmask import create_batch_anonymizer

# Create reusable anonymizer
anon = create_batch_anonymizer(seed="batch-seed")

# Use it multiple times
result1 = anon("vpc-123")
result2 = anon("i-456")
```

## Central Storage

CloudMask stores mapping files in a secure central location by default:

```
~/.cloudmask/mapping.json
```

### Benefits

- 🔐 **Automatic Security**: Files created with `600` permissions (owner read/write only)
- 📍 **Centralized**: All mappings in one location
- 🔄 **Auto-Merge**: New mappings are merged with existing ones (same seed required)
- ⚡ **Convenient**: No need to specify `-m` flag for common workflows

### Seed Verification

CloudMask tracks which seed created each mapping file. Mappings can only be merged if they use the same seed:

```python
# First use
mask1 = CloudMask(seed="prod-seed")
mask1.anonymize("vpc-11111")
mask1.save_mapping(Storage.DefaultMappingPath)  # Creates mapping

# Later use with SAME seed - mappings merge automatically
mask2 = CloudMask(seed="prod-seed")
mask2.anonymize("vpc-22222")
mask2.save_mapping(Storage.DefaultMappingPath)  # Merges with existing

# Different seed - prevents accidental corruption
mask3 = CloudMask(seed="different-seed")
mask3.save_mapping(Storage.DefaultMappingPath)  # ERROR: Seed mismatch
```

### Custom Locations

You can still use custom mapping locations:

```bash
cloudmask anonymize -i file.txt -m /custom/path/mapping.json
```

```python
mask.save_mapping("/custom/path/mapping.json")
```

## Security Notes

⚠️ **Keep your mapping files secure!** They contain the reversible mappings.

- The `~/.cloudmask/` directory is automatically secured with `700` permissions
- Mapping files are created with `600` permissions (owner only)
- Store mapping files separately from anonymized data
- Use strong, unique seeds for different projects
- Don't commit mapping files to version control
- Consider encrypting mapping files for sensitive data
- Always use the same seed for a project to enable mapping merges

## Contributing

Contributions welcome! Please check out the [GitHub repository](https://github.com/sam-fakhreddine/cloudmask-aws).

## License

MIT License - see LICENSE file for details

## Troubleshooting

### Pattern Not Matching?

1. Test the regex: `python -c "import re; print(re.search(r'your-pattern', 'test-string'))"`
2. Check case sensitivity: Patterns are case-insensitive
3. Verify word boundaries: `\b` may not work with special characters
4. Test with real data: Use small samples first

### Data Still Leaking?

1. Check pattern order: Custom patterns run before company names
2. Verify pattern name: Use generic names like `domain`, not `acme-domain`
3. Test thoroughly: `grep -i "sensitive" output.txt`

## Support

- 📖 [Documentation](https://github.com/sam-fakhreddine/cloudmask-aws#readme)
- 🐛 [Issue Tracker](https://github.com/sam-fakhreddine/cloudmask-aws/issues)
- 💬 [Discussions](https://github.com/sam-fakhreddine/cloudmask-aws/discussions)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cloudmask-aws",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Sam Fakhreddine <sam.fakhreddine@gmail.com>",
    "keywords": "aws, anonymization, security, privacy, llm, infrastructure, cloud, data-privacy, pii, masking, obfuscation",
    "author": null,
    "author_email": "Sam Fakhreddine <sam.fakhreddine@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5a/bc/5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650/cloudmask_aws-0.4.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n<img src=\"logo.png\" alt=\"CloudMask Logo\" width=\"200\"/>\n\n# CloudMask-AWS \ud83c\udfad\n\n**Anonymize AWS infrastructure identifiers for secure LLM processing**\n\n[![PyPI version](https://badge.fury.io/py/cloudmask-aws.svg)](https://badge.fury.io/py/cloudmask-aws)\n[![Python Versions](https://img.shields.io/pypi/pyversions/cloudmask-aws.svg)](https://pypi.org/project/cloudmask-aws/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n</div>\n\nCloudMask helps you safely share AWS infrastructure data with Large Language Models by anonymizing sensitive identifiers while maintaining structure and reversibility.\n\n## Features\n\n- \ud83d\udd12 **Secure Anonymization**: Hash-based deterministic anonymization\n- \ud83d\udd04 **Reversible**: Complete mapping for unanonymization\n- \ud83c\udfd7\ufe0f **Structure-Preserving**: Maintains AWS resource ID prefixes (vpc-, i-, etc.)\n- \u2699\ufe0f **Configurable**: YAML-based configuration for company names and custom patterns\n- \ud83d\udc0d **Dual Interface**: Use as CLI tool or Python library\n- \ud83d\udccb **Clipboard Support**: Direct clipboard anonymization for quick workflows\n- \ud83d\udce6 **Modern Python**: Built with Python 3.10+ features (pattern matching, union types)\n- \ud83d\ude80 **Minimal Dependencies**: Only requires PyYAML and pyperclip\n\n## Requirements\n\n- Python 3.10 or higher\n- PyYAML 6.0+\n\nCloudMask leverages modern Python features including structural pattern matching, union type operators, and built-in generic types.\n\n## Installation\n\n```bash\npip install cloudmask-aws\n```\n\n## Quick Start\n\n### CLI Usage\n\n```bash\n# Generate config file\ncloudmask init-config\n\n# Anonymize (mapping stored in ~/.cloudmask/mapping.json by default)\ncloudmask anonymize -i infrastructure.txt -o anonymized.txt\n\n# Anonymize clipboard content\ncloudmask anonymize --clipboard\n\n# Restore original values\ncloudmask unanonymize -i llm-response.txt -o restored.txt\n\n# Restore clipboard content\ncloudmask unanonymize --clipboard\n\n# Use custom mapping location if needed\ncloudmask anonymize --clipboard -m /path/to/custom-mapping.json\n```\n\n### Python Library Usage\n\n```python\nfrom cloudmask import CloudMask, CloudUnmask, Storage\n\n# Anonymize text\nmask = CloudMask(seed=\"my-secret-seed\")\nanonymized = mask.anonymize(\"\"\"\n    Instance i-1234567890abcdef0 is running in vpc-abcdef123456\n    Account: 123456789012\n    Company: Acme Corp\n\"\"\")\n\n# Save mapping to secure central location (~/.cloudmask/mapping.json)\nmask.save_mapping(Storage.DefaultMappingPath)\n\n# Unanonymize later\nunmask = CloudUnmask(mapping_file=Storage.DefaultMappingPath)\noriginal = unmask.unanonymize(anonymized)\n```\n\n### Quick Function Usage\n\n```python\nfrom cloudmask import anonymize, unanonymize\n\n# One-liner anonymization\ntext, mapping = anonymize(\n    \"Instance i-123 in account 123456789012\",\n    seed=\"my-seed\",\n    company_names=[\"Acme Corp\"]\n)\n\n# Restore original\noriginal = unanonymize(text, mapping)\n```\n\n## Configuration\n\nCreate a `~/.cloudmask/config.yml` file (or use `cloudmask init-config`):\n\n```yaml\ncompany_names:\n  - Acme Corp\n  - Example Inc\n  - MyCompany LLC\n\ncustom_patterns:\n  - pattern: '\\bTICKET-\\d{4,6}\\b'\n    name: ticket\n  - pattern: '\\bPROJ-[A-Z0-9]+'\n    name: project\n  # Example: Match company domains\n  - pattern: '[a-zA-Z0-9][-a-zA-Z0-9.]*\\.acme\\.local\\b'\n    name: domain\n  # Example: Match company emails\n  - pattern: '[a-zA-Z0-9._%+-]+@acme-corp\\.local\\b'\n    name: email\n\npreserve_prefixes: true\nanonymize_ips: true\nanonymize_domains: false\nseed: my-secret-seed\n```\n\n### Custom Pattern Edge Cases\n\nWhen creating custom patterns, be aware of:\n\n- **Case sensitivity**: Patterns are case-insensitive by default\n- **Multi-level subdomains**: Use `[-a-zA-Z0-9.]*` to match `app.corp.acme.local`\n- **HTML entities**: `&quot;` in data is handled automatically\n- **Email variations**: Pattern supports `user+tag@domain` and `user.name@domain`\n- **URLs and paths**: Patterns work in `https://server.acme.local/path`\n- **Trailing punctuation**: Word boundaries (`\\b`) handle `server.acme.local.`\n\nTest your patterns:\n```bash\ncloudmask anonymize -i test.txt -o output.txt\ngrep -i \"sensitive-term\" output.txt  # Should return nothing\n```\n\n## What Gets Anonymized?\n\n- \u2705 AWS Resource IDs (vpc-, i-, sg-, ami-, etc.)\n- \u2705 AWS Account IDs (12-digit numbers)\n- \u2705 AWS ARNs\n- \u2705 IP Addresses\n- \u2705 Company names (from config)\n- \u2705 Custom patterns (via regex)\n- \u2705 Domain names (optional)\n\n## Use Cases\n\n- \ud83e\udd16 **LLM Assistance**: Get help with infrastructure without exposing sensitive IDs\n- \ud83d\udcca **Data Sharing**: Share infrastructure diagrams and configs safely\n- \ud83d\udd0d **Security Analysis**: Analyze configs with external tools\n- \ud83d\udcdd **Documentation**: Create shareable examples from real infrastructure\n\n## Advanced Usage\n\n### Using in Scripts\n\n```python\nfrom cloudmask import CloudMask, Config\nfrom pathlib import Path\n\n# Load custom config\nconfig = Config.from_yaml(Path(\"custom-config.yaml\"))\n\n# Create anonymizer\nmask = CloudMask(config=config, seed=\"production-seed\")\n\n# Process multiple files\nfor file in Path(\"configs\").glob(\"*.yaml\"):\n    output = Path(\"anonymized\") / file.name\n    mask.anonymize_file(file, output)\n\n# Save single mapping for all files\nmask.save_mapping(\"master-mapping.json\")\n```\n\n### Context Manager\n\n```python\nfrom cloudmask import TemporaryMask\n\nwith TemporaryMask(seed=\"temp-seed\") as mask:\n    anonymized = mask.anonymize(\"vpc-123 i-456\")\n    # Process anonymized data\n    # Mapping is discarded after context exits\n```\n\n### Batch Processing\n\n```python\nfrom cloudmask import create_batch_anonymizer\n\n# Create reusable anonymizer\nanon = create_batch_anonymizer(seed=\"batch-seed\")\n\n# Use it multiple times\nresult1 = anon(\"vpc-123\")\nresult2 = anon(\"i-456\")\n```\n\n## Central Storage\n\nCloudMask stores mapping files in a secure central location by default:\n\n```\n~/.cloudmask/mapping.json\n```\n\n### Benefits\n\n- \ud83d\udd10 **Automatic Security**: Files created with `600` permissions (owner read/write only)\n- \ud83d\udccd **Centralized**: All mappings in one location\n- \ud83d\udd04 **Auto-Merge**: New mappings are merged with existing ones (same seed required)\n- \u26a1 **Convenient**: No need to specify `-m` flag for common workflows\n\n### Seed Verification\n\nCloudMask tracks which seed created each mapping file. Mappings can only be merged if they use the same seed:\n\n```python\n# First use\nmask1 = CloudMask(seed=\"prod-seed\")\nmask1.anonymize(\"vpc-11111\")\nmask1.save_mapping(Storage.DefaultMappingPath)  # Creates mapping\n\n# Later use with SAME seed - mappings merge automatically\nmask2 = CloudMask(seed=\"prod-seed\")\nmask2.anonymize(\"vpc-22222\")\nmask2.save_mapping(Storage.DefaultMappingPath)  # Merges with existing\n\n# Different seed - prevents accidental corruption\nmask3 = CloudMask(seed=\"different-seed\")\nmask3.save_mapping(Storage.DefaultMappingPath)  # ERROR: Seed mismatch\n```\n\n### Custom Locations\n\nYou can still use custom mapping locations:\n\n```bash\ncloudmask anonymize -i file.txt -m /custom/path/mapping.json\n```\n\n```python\nmask.save_mapping(\"/custom/path/mapping.json\")\n```\n\n## Security Notes\n\n\u26a0\ufe0f **Keep your mapping files secure!** They contain the reversible mappings.\n\n- The `~/.cloudmask/` directory is automatically secured with `700` permissions\n- Mapping files are created with `600` permissions (owner only)\n- Store mapping files separately from anonymized data\n- Use strong, unique seeds for different projects\n- Don't commit mapping files to version control\n- Consider encrypting mapping files for sensitive data\n- Always use the same seed for a project to enable mapping merges\n\n## Contributing\n\nContributions welcome! Please check out the [GitHub repository](https://github.com/sam-fakhreddine/cloudmask-aws).\n\n## License\n\nMIT License - see LICENSE file for details\n\n## Troubleshooting\n\n### Pattern Not Matching?\n\n1. Test the regex: `python -c \"import re; print(re.search(r'your-pattern', 'test-string'))\"`\n2. Check case sensitivity: Patterns are case-insensitive\n3. Verify word boundaries: `\\b` may not work with special characters\n4. Test with real data: Use small samples first\n\n### Data Still Leaking?\n\n1. Check pattern order: Custom patterns run before company names\n2. Verify pattern name: Use generic names like `domain`, not `acme-domain`\n3. Test thoroughly: `grep -i \"sensitive\" output.txt`\n\n## Support\n\n- \ud83d\udcd6 [Documentation](https://github.com/sam-fakhreddine/cloudmask-aws#readme)\n- \ud83d\udc1b [Issue Tracker](https://github.com/sam-fakhreddine/cloudmask-aws/issues)\n- \ud83d\udcac [Discussions](https://github.com/sam-fakhreddine/cloudmask-aws/discussions)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Anonymize AWS infrastructure identifiers for secure LLM processing",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/sam-fakhreddine/cloudmask-aws/issues",
        "Changelog": "https://github.com/sam-fakhreddine/cloudmask-aws/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/sam-fakhreddine/cloudmask-aws#readme",
        "Homepage": "https://github.com/sam-fakhreddine/cloudmask-aws",
        "Repository": "https://github.com/sam-fakhreddine/cloudmask-aws",
        "Source Code": "https://github.com/sam-fakhreddine/cloudmask-aws"
    },
    "split_keywords": [
        "aws",
        " anonymization",
        " security",
        " privacy",
        " llm",
        " infrastructure",
        " cloud",
        " data-privacy",
        " pii",
        " masking",
        " obfuscation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96536a6b8bfce9735294bdf06d981cc9c4e1aa2da2fa17d7091d43200bfb677e",
                "md5": "309fe2f02717ac618407cacf33633cbc",
                "sha256": "a545fe37efc606aa3879bca4c0e28ffb0b7bd967b557aa52580cf1a155ad1a16"
            },
            "downloads": -1,
            "filename": "cloudmask_aws-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "309fe2f02717ac618407cacf33633cbc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 31538,
            "upload_time": "2025-10-14T01:50:19",
            "upload_time_iso_8601": "2025-10-14T01:50:19.673608Z",
            "url": "https://files.pythonhosted.org/packages/96/53/6a6b8bfce9735294bdf06d981cc9c4e1aa2da2fa17d7091d43200bfb677e/cloudmask_aws-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5abc5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650",
                "md5": "ae9db6c1c9035f647c646a8c1e0ca51a",
                "sha256": "72d3b2bc61f13117d2469ab4f8a29baf611db7e898095d69e6f44ca639cd677f"
            },
            "downloads": -1,
            "filename": "cloudmask_aws-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ae9db6c1c9035f647c646a8c1e0ca51a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 448059,
            "upload_time": "2025-10-14T01:50:20",
            "upload_time_iso_8601": "2025-10-14T01:50:20.888176Z",
            "url": "https://files.pythonhosted.org/packages/5a/bc/5fd5ef1c22b160b94521f68fa38cc10650a2e851f5521f6dc1f2c45c4650/cloudmask_aws-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-14 01:50:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sam-fakhreddine",
    "github_project": "cloudmask-aws",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cloudmask-aws"
}
        
Elapsed time: 2.04450s