# regex-vault
**regex-vault** is a general-purpose engine that detects and masks personal information (mobile phone numbers, social security numbers, email addresses, etc.) by **country and information type**, using a "pattern file-based + library + daemon (server)."
## Features
- 🌍 **Global Support**: Patterns organized by country (ISO2) and information type
- 🔍 **Detection**: Find PII in text using multiple patterns
- ✅ **Validation**: Validate text against specific patterns with optional verification functions
- 🔒 **Redaction**: Mask, hash, or tokenize sensitive information
- 🚀 **Multiple Interfaces**: Library API, CLI, and HTTP/gRPC server
- ⚡ **High Performance**: p95 < 10ms for 1KB text (single namespace)
- 🔄 **Hot Reload**: Non-disruptive pattern reloading
- 📊 **Observability**: Prometheus metrics and health checks
## Quick Start
### Clone repository
```bash
git clone https://github.com/regex-vault.git
```
### Installation
```bash
pip install regex-vault
```
See [Installation Guide](docs/installation.md) for more options.
### Library Usage
```python
from regexvault import Engine, load_registry
# Load patterns from directory
registry = load_registry(paths=["patterns/"])
engine = Engine(registry)
# Validate
is_valid = engine.validate("010-1234-5678", "kr/mobile_01")
# Find PII (searches all loaded patterns)
results = engine.find("My phone: 01012345678, email: test@example.com")
# Redact
redacted = engine.redact("SSN: 900101-1234567", namespaces=["kr"])
print(redacted.redacted_text)
```
### CLI Usage
```bash
# Find PII
regex-vault find --text "010-1234-5678" --ns kr
# Redact PII
regex-vault redact --in input.log --out redacted.log --ns kr us
# Start server
regex-vault serve --port 8080
```
### REST API
```bash
# Start server
regex-vault serve --port 8080
# Find PII
curl -X POST http://localhost:8080/find \
-H "Content-Type: application/json" \
-d '{"text": "010-1234-5678", "namespaces": ["kr"]}'
```
## Documentation
- [Quick Start Guide](docs/quickstart.md) - Get started quickly
- [Pattern Structure](docs/patterns.md) - Learn about pattern definitions
- [Custom Patterns](docs/custom-patterns.md) - Create your own patterns
- [Verification Functions](docs/verification.md) - Add custom validation logic
- [Configuration](docs/configuration.md) - Server and registry configuration
- [API Reference](docs/api-reference.md) - Complete API documentation
- [Supported Patterns](docs/supported-patterns.md) - Built-in pattern catalog
## Supported Pattern Types
- 📱 Phone numbers (KR, US, TW, JP, CN, IN)
- 🆔 National IDs (SSN, RRN, Aadhaar, etc.)
- 📧 Email addresses
- 🏦 Bank accounts & IBANs (with Mod-97 verification)
- 💳 Credit cards (Visa, MasterCard, Amex, etc.)
- 🛂 Passport numbers
- 📍 Physical addresses
- 🌐 IP addresses & URLs
**Total**: 60+ patterns across 7 locations (Common, KR, US, TW, JP, CN, IN)
See [Supported Patterns](docs/supported-patterns.md) for the complete list.
## Verification Functions
Patterns can include verification functions for additional validation beyond regex:
```yaml
- id: iban_01
category: iban
pattern: '[A-Z]{2}\d{2}[A-Z0-9]{11,30}'
verification: iban_mod97 # Validates IBAN checksum
```
Built-in verification functions:
- `iban_mod97` - IBAN Mod-97 checksum validation
- `luhn` - Luhn algorithm for credit cards
You can also register custom verification functions. See [Verification Functions](docs/verification.md) for details.
## Performance
- **Latency**: p95 < 10ms for 1KB text with single namespace
- **Throughput**: 500+ RPS on 1 vCPU, 512MB RAM
- **Scalability**: Handles 1k+ patterns and 1k+ concurrent requests
## Security & Privacy
- No raw PII is logged (only hashes/metadata)
- TLS support for server
- Configurable rate limiting
- GDPR/CCPA compliant operations
## Development
```bash
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/ tests/
ruff check src/ tests/
# Validate patterns
python -c "from regexvault import load_registry; load_registry(validate_examples=True)"
```
## Docker
```bash
# Build
docker build -t regex-vault:latest .
# Run
docker run -p 8080:8080 -v ./patterns:/app/patterns regex-vault:latest
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests.
## Support
- 📖 [Documentation](docs/)
- 🐛 [Issue Tracker](https://github.com/yourusername/regex-vault/issues)
- 💬 [Discussions](https://github.com/yourusername/regex-vault/discussions)
Raw data
{
"_id": null,
"home_page": null,
"name": "regex-vault",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pii, detection, redaction, regex, privacy, gdpr",
"author": null,
"author_email": "zafrem <zafrem@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/10/b9/f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671/regex_vault-0.1.2.tar.gz",
"platform": null,
"description": "# regex-vault\n\n**regex-vault** is a general-purpose engine that detects and masks personal information (mobile phone numbers, social security numbers, email addresses, etc.) by **country and information type**, using a \"pattern file-based + library + daemon (server).\"\n\n## Features\n\n- \ud83c\udf0d **Global Support**: Patterns organized by country (ISO2) and information type\n- \ud83d\udd0d **Detection**: Find PII in text using multiple patterns\n- \u2705 **Validation**: Validate text against specific patterns with optional verification functions\n- \ud83d\udd12 **Redaction**: Mask, hash, or tokenize sensitive information\n- \ud83d\ude80 **Multiple Interfaces**: Library API, CLI, and HTTP/gRPC server\n- \u26a1 **High Performance**: p95 < 10ms for 1KB text (single namespace)\n- \ud83d\udd04 **Hot Reload**: Non-disruptive pattern reloading\n- \ud83d\udcca **Observability**: Prometheus metrics and health checks\n\n## Quick Start\n### Clone repository\n```bash\ngit clone https://github.com/regex-vault.git\n```\n\n### Installation\n\n```bash\npip install regex-vault\n```\n\nSee [Installation Guide](docs/installation.md) for more options.\n\n### Library Usage\n\n```python\nfrom regexvault import Engine, load_registry\n\n# Load patterns from directory\nregistry = load_registry(paths=[\"patterns/\"])\nengine = Engine(registry)\n\n# Validate\nis_valid = engine.validate(\"010-1234-5678\", \"kr/mobile_01\")\n\n# Find PII (searches all loaded patterns)\nresults = engine.find(\"My phone: 01012345678, email: test@example.com\")\n\n# Redact\nredacted = engine.redact(\"SSN: 900101-1234567\", namespaces=[\"kr\"])\nprint(redacted.redacted_text)\n```\n\n### CLI Usage\n\n```bash\n# Find PII\nregex-vault find --text \"010-1234-5678\" --ns kr\n\n# Redact PII\nregex-vault redact --in input.log --out redacted.log --ns kr us\n\n# Start server\nregex-vault serve --port 8080\n```\n\n### REST API\n\n```bash\n# Start server\nregex-vault serve --port 8080\n\n# Find PII\ncurl -X POST http://localhost:8080/find \\\n -H \"Content-Type: application/json\" \\\n -d '{\"text\": \"010-1234-5678\", \"namespaces\": [\"kr\"]}'\n```\n\n## Documentation\n\n- [Quick Start Guide](docs/quickstart.md) - Get started quickly\n- [Pattern Structure](docs/patterns.md) - Learn about pattern definitions\n- [Custom Patterns](docs/custom-patterns.md) - Create your own patterns\n- [Verification Functions](docs/verification.md) - Add custom validation logic\n- [Configuration](docs/configuration.md) - Server and registry configuration\n- [API Reference](docs/api-reference.md) - Complete API documentation\n- [Supported Patterns](docs/supported-patterns.md) - Built-in pattern catalog\n\n## Supported Pattern Types\n\n- \ud83d\udcf1 Phone numbers (KR, US, TW, JP, CN, IN)\n- \ud83c\udd94 National IDs (SSN, RRN, Aadhaar, etc.)\n- \ud83d\udce7 Email addresses\n- \ud83c\udfe6 Bank accounts & IBANs (with Mod-97 verification)\n- \ud83d\udcb3 Credit cards (Visa, MasterCard, Amex, etc.)\n- \ud83d\udec2 Passport numbers\n- \ud83d\udccd Physical addresses\n- \ud83c\udf10 IP addresses & URLs\n\n**Total**: 60+ patterns across 7 locations (Common, KR, US, TW, JP, CN, IN)\n\nSee [Supported Patterns](docs/supported-patterns.md) for the complete list.\n\n## Verification Functions\n\nPatterns can include verification functions for additional validation beyond regex:\n\n```yaml\n- id: iban_01\n category: iban\n pattern: '[A-Z]{2}\\d{2}[A-Z0-9]{11,30}'\n verification: iban_mod97 # Validates IBAN checksum\n```\n\nBuilt-in verification functions:\n- `iban_mod97` - IBAN Mod-97 checksum validation\n- `luhn` - Luhn algorithm for credit cards\n\nYou can also register custom verification functions. See [Verification Functions](docs/verification.md) for details.\n\n## Performance\n\n- **Latency**: p95 < 10ms for 1KB text with single namespace\n- **Throughput**: 500+ RPS on 1 vCPU, 512MB RAM\n- **Scalability**: Handles 1k+ patterns and 1k+ concurrent requests\n\n## Security & Privacy\n\n- No raw PII is logged (only hashes/metadata)\n- TLS support for server\n- Configurable rate limiting\n- GDPR/CCPA compliant operations\n\n## Development\n\n```bash\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack src/ tests/\nruff check src/ tests/\n\n# Validate patterns\npython -c \"from regexvault import load_registry; load_registry(validate_examples=True)\"\n```\n\n## Docker\n\n```bash\n# Build\ndocker build -t regex-vault:latest .\n\n# Run\ndocker run -p 8080:8080 -v ./patterns:/app/patterns regex-vault:latest\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please read our contributing guidelines and submit pull requests.\n\n## Support\n\n- \ud83d\udcd6 [Documentation](docs/)\n- \ud83d\udc1b [Issue Tracker](https://github.com/yourusername/regex-vault/issues)\n- \ud83d\udcac [Discussions](https://github.com/yourusername/regex-vault/discussions)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A general-purpose engine that detects and masks personal information by country and information type",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/zafrem/regex-vault",
"Issues": "https://github.com/zafrem/regex-vault/issues",
"Repository": "https://github.com/zafrem/regex-vault"
},
"split_keywords": [
"pii",
" detection",
" redaction",
" regex",
" privacy",
" gdpr"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d5cf09b66d1b50c2d419218604dcf8a43af461c9aba9320a7a139ec5fdf83465",
"md5": "61fc51913196e6627298b81b7f4a9f6b",
"sha256": "f7f471cff64b0d3fe4caabbb08bd030aa5ebcc9fdb61badd4fb4e0402045c842"
},
"downloads": -1,
"filename": "regex_vault-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "61fc51913196e6627298b81b7f4a9f6b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 19114,
"upload_time": "2025-10-08T11:44:43",
"upload_time_iso_8601": "2025-10-08T11:44:43.534411Z",
"url": "https://files.pythonhosted.org/packages/d5/cf/09b66d1b50c2d419218604dcf8a43af461c9aba9320a7a139ec5fdf83465/regex_vault-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "10b9f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671",
"md5": "8932bc2be1d09ca2532e11b49a28588c",
"sha256": "bca952039daf3441ebaa57fe13a0cdabc4dc5fa4c60eeff09e51bd9f1fcd5015"
},
"downloads": -1,
"filename": "regex_vault-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "8932bc2be1d09ca2532e11b49a28588c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 25200,
"upload_time": "2025-10-08T11:44:44",
"upload_time_iso_8601": "2025-10-08T11:44:44.627473Z",
"url": "https://files.pythonhosted.org/packages/10/b9/f57f467f2c21bc25d224414941ebb3081e55c0875daf56e3eaaad8c2d671/regex_vault-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-08 11:44:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zafrem",
"github_project": "regex-vault",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "jsonschema",
"specs": [
[
">=",
"4.17.0"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.1.0"
]
]
},
{
"name": "fastapi",
"specs": [
[
">=",
"0.104.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "grpcio",
"specs": [
[
">=",
"1.59.0"
]
]
},
{
"name": "grpcio-tools",
"specs": [
[
">=",
"1.59.0"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
">=",
"0.19.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "python-multipart",
"specs": [
[
">=",
"0.0.6"
]
]
}
],
"lcname": "regex-vault"
}