regex-vault


Nameregex-vault JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA general-purpose engine that detects and masks personal information by country and information type
upload_time2025-10-08 11:44:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords pii detection redaction regex privacy gdpr
VCS
bugtrack_url
requirements pyyaml jsonschema click fastapi uvicorn grpcio grpcio-tools prometheus-client pydantic python-multipart
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # regex-vault

**regex-vault** is a general-purpose engine that detects and masks personal information (mobile phone numbers, social security numbers, email addresses, etc.) by **country and information type**, using a "pattern file-based + library + daemon (server)."

## Features

- 🌍 **Global Support**: Patterns organized by country (ISO2) and information type
- 🔍 **Detection**: Find PII in text using multiple patterns
- ✅ **Validation**: Validate text against specific patterns with optional verification functions
- 🔒 **Redaction**: Mask, hash, or tokenize sensitive information
- 🚀 **Multiple Interfaces**: Library API, CLI, and HTTP/gRPC server
- ⚡ **High Performance**: p95 < 10ms for 1KB text (single namespace)
- 🔄 **Hot Reload**: Non-disruptive pattern reloading
- 📊 **Observability**: Prometheus metrics and health checks

## Quick Start
### Clone repository
```bash
git clone https://github.com/regex-vault.git
```

### Installation

```bash
pip install regex-vault
```

See [Installation Guide](docs/installation.md) for more options.

### Library Usage

```python
from regexvault import Engine, load_registry

# Load patterns from directory
registry = load_registry(paths=["patterns/"])
engine = Engine(registry)

# Validate
is_valid = engine.validate("010-1234-5678", "kr/mobile_01")

# Find PII (searches all loaded patterns)
results = engine.find("My phone: 01012345678, email: test@example.com")

# Redact
redacted = engine.redact("SSN: 900101-1234567", namespaces=["kr"])
print(redacted.redacted_text)
```

### CLI Usage

```bash
# Find PII
regex-vault find --text "010-1234-5678" --ns kr

# Redact PII
regex-vault redact --in input.log --out redacted.log --ns kr us

# Start server
regex-vault serve --port 8080
```

### REST API

```bash
# Start server
regex-vault serve --port 8080

# Find PII
curl -X POST http://localhost:8080/find \
  -H "Content-Type: application/json" \
  -d '{"text": "010-1234-5678", "namespaces": ["kr"]}'
```

## Documentation

- [Quick Start Guide](docs/quickstart.md) - Get started quickly
- [Pattern Structure](docs/patterns.md) - Learn about pattern definitions
- [Custom Patterns](docs/custom-patterns.md) - Create your own patterns
- [Verification Functions](docs/verification.md) - Add custom validation logic
- [Configuration](docs/configuration.md) - Server and registry configuration
- [API Reference](docs/api-reference.md) - Complete API documentation
- [Supported Patterns](docs/supported-patterns.md) - Built-in pattern catalog

## Supported Pattern Types

- 📱 Phone numbers (KR, US, TW, JP, CN, IN)
- 🆔 National IDs (SSN, RRN, Aadhaar, etc.)
- 📧 Email addresses
- 🏦 Bank accounts & IBANs (with Mod-97 verification)
- 💳 Credit cards (Visa, MasterCard, Amex, etc.)
- 🛂 Passport numbers
- 📍 Physical addresses
- 🌐 IP addresses & URLs

**Total**: 60+ patterns across 7 locations (Common, KR, US, TW, JP, CN, IN)

See [Supported Patterns](docs/supported-patterns.md) for the complete list.

## Verification Functions

Patterns can include verification functions for additional validation beyond regex:

```yaml
- id: iban_01
  category: iban
  pattern: '[A-Z]{2}\d{2}[A-Z0-9]{11,30}'
  verification: iban_mod97  # Validates IBAN checksum
```

Built-in verification functions:
- `iban_mod97` - IBAN Mod-97 checksum validation
- `luhn` - Luhn algorithm for credit cards

You can also register custom verification functions. See [Verification Functions](docs/verification.md) for details.

## Performance

- **Latency**: p95 < 10ms for 1KB text with single namespace
- **Throughput**: 500+ RPS on 1 vCPU, 512MB RAM
- **Scalability**: Handles 1k+ patterns and 1k+ concurrent requests

## Security & Privacy

- No raw PII is logged (only hashes/metadata)
- TLS support for server
- Configurable rate limiting
- GDPR/CCPA compliant operations

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/ tests/
ruff check src/ tests/

# Validate patterns
python -c "from regexvault import load_registry; load_registry(validate_examples=True)"
```

## Docker

```bash
# Build
docker build -t regex-vault:latest .

# Run
docker run -p 8080:8080 -v ./patterns:/app/patterns regex-vault:latest
```

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests.

## Support

- 📖 [Documentation](docs/)
- 🐛 [Issue Tracker](https://github.com/yourusername/regex-vault/issues)
- 💬 [Discussions](https://github.com/yourusername/regex-vault/discussions)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "regex-vault",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pii, detection, redaction, regex, privacy, gdpr",
    "author": null,
    "author_email": "zafrem <zafrem@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/10/b9/f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671/regex_vault-0.1.2.tar.gz",
    "platform": null,
    "description": "# regex-vault\n\n**regex-vault** is a general-purpose engine that detects and masks personal information (mobile phone numbers, social security numbers, email addresses, etc.) by **country and information type**, using a \"pattern file-based + library + daemon (server).\"\n\n## Features\n\n- \ud83c\udf0d **Global Support**: Patterns organized by country (ISO2) and information type\n- \ud83d\udd0d **Detection**: Find PII in text using multiple patterns\n- \u2705 **Validation**: Validate text against specific patterns with optional verification functions\n- \ud83d\udd12 **Redaction**: Mask, hash, or tokenize sensitive information\n- \ud83d\ude80 **Multiple Interfaces**: Library API, CLI, and HTTP/gRPC server\n- \u26a1 **High Performance**: p95 < 10ms for 1KB text (single namespace)\n- \ud83d\udd04 **Hot Reload**: Non-disruptive pattern reloading\n- \ud83d\udcca **Observability**: Prometheus metrics and health checks\n\n## Quick Start\n### Clone repository\n```bash\ngit clone https://github.com/regex-vault.git\n```\n\n### Installation\n\n```bash\npip install regex-vault\n```\n\nSee [Installation Guide](docs/installation.md) for more options.\n\n### Library Usage\n\n```python\nfrom regexvault import Engine, load_registry\n\n# Load patterns from directory\nregistry = load_registry(paths=[\"patterns/\"])\nengine = Engine(registry)\n\n# Validate\nis_valid = engine.validate(\"010-1234-5678\", \"kr/mobile_01\")\n\n# Find PII (searches all loaded patterns)\nresults = engine.find(\"My phone: 01012345678, email: test@example.com\")\n\n# Redact\nredacted = engine.redact(\"SSN: 900101-1234567\", namespaces=[\"kr\"])\nprint(redacted.redacted_text)\n```\n\n### CLI Usage\n\n```bash\n# Find PII\nregex-vault find --text \"010-1234-5678\" --ns kr\n\n# Redact PII\nregex-vault redact --in input.log --out redacted.log --ns kr us\n\n# Start server\nregex-vault serve --port 8080\n```\n\n### REST API\n\n```bash\n# Start server\nregex-vault serve --port 8080\n\n# Find PII\ncurl -X POST http://localhost:8080/find \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"010-1234-5678\", \"namespaces\": [\"kr\"]}'\n```\n\n## Documentation\n\n- [Quick Start Guide](docs/quickstart.md) - Get started quickly\n- [Pattern Structure](docs/patterns.md) - Learn about pattern definitions\n- [Custom Patterns](docs/custom-patterns.md) - Create your own patterns\n- [Verification Functions](docs/verification.md) - Add custom validation logic\n- [Configuration](docs/configuration.md) - Server and registry configuration\n- [API Reference](docs/api-reference.md) - Complete API documentation\n- [Supported Patterns](docs/supported-patterns.md) - Built-in pattern catalog\n\n## Supported Pattern Types\n\n- \ud83d\udcf1 Phone numbers (KR, US, TW, JP, CN, IN)\n- \ud83c\udd94 National IDs (SSN, RRN, Aadhaar, etc.)\n- \ud83d\udce7 Email addresses\n- \ud83c\udfe6 Bank accounts & IBANs (with Mod-97 verification)\n- \ud83d\udcb3 Credit cards (Visa, MasterCard, Amex, etc.)\n- \ud83d\udec2 Passport numbers\n- \ud83d\udccd Physical addresses\n- \ud83c\udf10 IP addresses & URLs\n\n**Total**: 60+ patterns across 7 locations (Common, KR, US, TW, JP, CN, IN)\n\nSee [Supported Patterns](docs/supported-patterns.md) for the complete list.\n\n## Verification Functions\n\nPatterns can include verification functions for additional validation beyond regex:\n\n```yaml\n- id: iban_01\n  category: iban\n  pattern: '[A-Z]{2}\\d{2}[A-Z0-9]{11,30}'\n  verification: iban_mod97  # Validates IBAN checksum\n```\n\nBuilt-in verification functions:\n- `iban_mod97` - IBAN Mod-97 checksum validation\n- `luhn` - Luhn algorithm for credit cards\n\nYou can also register custom verification functions. See [Verification Functions](docs/verification.md) for details.\n\n## Performance\n\n- **Latency**: p95 < 10ms for 1KB text with single namespace\n- **Throughput**: 500+ RPS on 1 vCPU, 512MB RAM\n- **Scalability**: Handles 1k+ patterns and 1k+ concurrent requests\n\n## Security & Privacy\n\n- No raw PII is logged (only hashes/metadata)\n- TLS support for server\n- Configurable rate limiting\n- GDPR/CCPA compliant operations\n\n## Development\n\n```bash\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack src/ tests/\nruff check src/ tests/\n\n# Validate patterns\npython -c \"from regexvault import load_registry; load_registry(validate_examples=True)\"\n```\n\n## Docker\n\n```bash\n# Build\ndocker build -t regex-vault:latest .\n\n# Run\ndocker run -p 8080:8080 -v ./patterns:/app/patterns regex-vault:latest\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please read our contributing guidelines and submit pull requests.\n\n## Support\n\n- \ud83d\udcd6 [Documentation](docs/)\n- \ud83d\udc1b [Issue Tracker](https://github.com/yourusername/regex-vault/issues)\n- \ud83d\udcac [Discussions](https://github.com/yourusername/regex-vault/discussions)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A general-purpose engine that detects and masks personal information by country and information type",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/zafrem/regex-vault",
        "Issues": "https://github.com/zafrem/regex-vault/issues",
        "Repository": "https://github.com/zafrem/regex-vault"
    },
    "split_keywords": [
        "pii",
        " detection",
        " redaction",
        " regex",
        " privacy",
        " gdpr"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d5cf09b66d1b50c2d419218604dcf8a43af461c9aba9320a7a139ec5fdf83465",
                "md5": "61fc51913196e6627298b81b7f4a9f6b",
                "sha256": "f7f471cff64b0d3fe4caabbb08bd030aa5ebcc9fdb61badd4fb4e0402045c842"
            },
            "downloads": -1,
            "filename": "regex_vault-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "61fc51913196e6627298b81b7f4a9f6b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 19114,
            "upload_time": "2025-10-08T11:44:43",
            "upload_time_iso_8601": "2025-10-08T11:44:43.534411Z",
            "url": "https://files.pythonhosted.org/packages/d5/cf/09b66d1b50c2d419218604dcf8a43af461c9aba9320a7a139ec5fdf83465/regex_vault-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "10b9f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671",
                "md5": "8932bc2be1d09ca2532e11b49a28588c",
                "sha256": "bca952039daf3441ebaa57fe13a0cdabc4dc5fa4c60eeff09e51bd9f1fcd5015"
            },
            "downloads": -1,
            "filename": "regex_vault-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8932bc2be1d09ca2532e11b49a28588c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 25200,
            "upload_time": "2025-10-08T11:44:44",
            "upload_time_iso_8601": "2025-10-08T11:44:44.627473Z",
            "url": "https://files.pythonhosted.org/packages/10/b9/f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671/regex_vault-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-08 11:44:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zafrem",
    "github_project": "regex-vault",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    ">=",
                    "4.17.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.1.0"
                ]
            ]
        },
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.104.0"
                ]
            ]
        },
        {
            "name": "uvicorn",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "grpcio",
            "specs": [
                [
                    ">=",
                    "1.59.0"
                ]
            ]
        },
        {
            "name": "grpcio-tools",
            "specs": [
                [
                    ">=",
                    "1.59.0"
                ]
            ]
        },
        {
            "name": "prometheus-client",
            "specs": [
                [
                    ">=",
                    "0.19.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "python-multipart",
            "specs": [
                [
                    ">=",
                    "0.0.6"
                ]
            ]
        }
    ],
    "lcname": "regex-vault"
}
        
Elapsed time: 1.59384s