# SecretSentry π‘οΈ
> **The first AI-powered sensitive data scanner built for modern data science and web development workflows**
[](https://badge.fury.io/py/secretsentry)
[](https://pypi.org/project/secretsentry/)
[](https://opensource.org/licenses/MIT)
SecretSentry is an advanced sensitive data scanner that goes beyond traditional secret detection. Built specifically for **Jupyter notebooks**, **web development**, and **data science workflows**, it combines **machine learning** with regex patterns to intelligently filter false positives while detecting API keys, PII, credentials, and other sensitive information.
## π― **Why SecretSentry?**
### **Built for Modern Workflows**
- π¬ **Jupyter Notebook Specialist**: First scanner designed for `.ipynb` files
- π€ **AI-Powered Detection**: Machine learning models reduce false positives by up to 80%
- π§ **Smart Context Awareness**: Understands code context, not just pattern matching
- π **Multi-Environment**: CLI, Jupyter notebooks, and Python scripts
- ποΈ **Interactive Analysis**: Built-in widgets for exploring findings
### **Comprehensive Detection**
- π **50+ Built-in Patterns**: API keys, tokens, secrets, credentials
- π€ **PII Detection**: SSNs, credit cards, phone numbers, emails
- π° **Financial Data**: Salary information, bank accounts, routing numbers
- π **Geographic Data**: Coordinates, IP addresses, postal codes
- π₯ **Sensitive Categories**: Ethnic data, religious information, medical records
### **Advanced Features**
- π‘οΈ **Smart Sanitization**: Context-aware gibberish replacement
- π€ **Ensemble Detection**: Combines regex + ML for maximum accuracy
- π **Rich Visualizations**: Charts and statistics (with matplotlib/seaborn)
- π **Pandas Integration**: Export to DataFrames for analysis
- π― **Confidence Scoring**: ML predictions with 0.0-1.0 confidence scores
- π **CI/CD Ready**: Perfect for automation and pipelines
- π₯οΈ **Cross-Platform**: Works on macOS, Windows, and Linux
## π **Quick Start**
### **Installation**
```bash
# Basic installation (regex-only detection)
pip install secretsentry
# With machine learning capabilities
pip install secretsentry[ml]
# Advanced ML with transformers (best accuracy)
pip install secretsentry[ml-advanced]
# Full installation with all features
pip install secretsentry[full]
# For Jupyter notebooks only
pip install secretsentry[jupyter]
```
### **Basic Usage**
```python
from secretsentry import SecretSentry, quick_scan, quick_ml_scan
# Quick scan with regex detection
scanner = quick_scan("./my_project")
# Quick scan with AI/ML enhancement (recommended)
scanner = quick_ml_scan("./my_project", confidence_threshold=0.7)
# Manual scanning with ML capabilities
scanner = SecretSentry(
use_ml_detection=True,
ml_confidence_threshold=0.7,
ml_ensemble_mode=True # Combines regex + ML
)
findings = scanner.scan_directory("./my_project")
scanner.display_findings()
# Access ML-specific results
ml_findings = scanner.get_ml_findings()
high_confidence = scanner.get_high_confidence_findings(0.8)
# Sanitize files (creates backups automatically)
stats = scanner.sanitize_files(dry_run=True) # Preview changes
stats = scanner.sanitize_files() # Actually sanitize
```
### **Command Line**
```bash
# Basic regex scanning
secretsentry scan ./my_project --display
# AI-enhanced scanning (recommended)
secretsentry scan ./my_project --ml --display
# Quick ML scan with optimal settings
secretsentry scan ./my_project --ml-quick
# ML-only detection with custom confidence
secretsentry scan ./my_project --ml-only --ml-confidence 0.8
# Check ML requirements
secretsentry scan --check-ml
# Export findings with ML metadata
secretsentry scan ./my_project --ml --export findings.json
# Sanitize files (with backup)
secretsentry scan ./my_project --sanitize --dry-run
secretsentry scan ./my_project --sanitize
# List all detection patterns
secretsentry list-patterns
```
## π€ **AI-Powered Detection**
SecretSentry's machine learning capabilities provide **context-aware detection** that dramatically reduces false positives:
### **ML Detection Modes**
```python
# Ensemble Mode (recommended): Combines regex + ML
scanner = SecretSentry(
use_ml_detection=True,
ml_ensemble_mode=True,
ml_confidence_threshold=0.7
)
# ML-Only Mode: Pure machine learning detection
scanner = SecretSentry(
use_ml_detection=True,
ml_ensemble_mode=False,
ml_confidence_threshold=0.8
)
# Quick ML scan with optimal settings
scanner = quick_ml_scan("./my_project")
```
### **ML Features**
- π§ **Context Understanding**: Analyzes surrounding code context, not just patterns
- π **Confidence Scoring**: Every ML detection includes a 0.0-1.0 confidence score
- π¬ **Feature Extraction**: Text entropy, keyword analysis, pattern recognition
- ποΈ **Multiple Models**: Logistic Regression, Isolation Forest, optional Transformers
- πΎ **Model Caching**: Trained models cached for faster subsequent scans
- π₯οΈ **Local Processing**: All ML inference happens on your machine (no data sent externally)
### **ML Requirements**
```bash
# Check what's available on your system
secretsentry scan --check-ml
# Install ML dependencies
pip install secretsentry[ml] # Basic ML (scikit-learn)
pip install secretsentry[ml-advanced] # Advanced ML (transformers)
```
## π **Jupyter Notebook Integration**
SecretSentry shines in Jupyter environments with **zero false positives** from notebook metadata:
```python
# In Jupyter notebook
from secretsentry import quick_scan, quick_ml_scan
# Quick ML scan with visualizations
scanner = quick_ml_scan("./test_data", show_plots=True)
# Interactive exploration with ML metadata
scanner.create_interactive_viewer()
# Data analysis with ML findings
df = scanner.to_dataframe(include_ml_findings=True)
summary = df.groupby(['pattern_type', 'detection_method']).size()
# Analyze confidence scores
ml_df = df[df['detection_method'] == 'ml']
confidence_analysis = ml_df['confidence_score'].describe()
```
## π **What Makes It Special**
### **AI-Enhanced Accuracy**
**Traditional regex scanners** flag these as secrets:
```
β aws_secret_key: iVBORw0KGgoAAAANSUhEUgAABKYAAAMW... # Just a PNG image!
β api_key: "cell_type": "code" # Notebook metadata!
β secret: #3498db # CSS color!
β token: "placeholder_for_testing" # Test data!
```
**SecretSentry with ML** understands context and only reports **real secrets**:
```
β
aws_secret_key: AKIAIOSFODNN7EXAMPLE (confidence: 0.95)
β
stripe_key: sk_live_1234567890abcdef123456789 (confidence: 0.89)
β
database_url: postgresql://user:password@localhost/db (confidence: 0.92)
```
**ML Advantages:**
- π― **Context Awareness**: Understands surrounding code patterns
- π **Confidence Scoring**: Know how certain each detection is
- π§ **Learning**: Improves over time with usage patterns
- π‘οΈ **Adaptive**: Handles new secret formats without regex updates
### **Smart Sanitization**
SecretSentry doesn't just find secretsβit **fixes them safely**:
```python
# Before sanitization
API_KEY = "sk_live_1234567890abcdef"
employee_ssn = "123-45-6789"
coordinates = "40.7128, -74.0060"
# After sanitization (context-aware gibberish)
API_KEY = "sk_live_xK8mP9nQ4vL7wR2Z"
employee_ssn = "456-78-9123"
coordinates = "38.8951, -77.0364"
```
## π§ **Advanced Usage**
### **Custom Patterns**
```python
# Add organization-specific patterns
custom_patterns = {
'employee_id': r'EMP-\d{6}',
'project_code': r'PROJ-[A-Z]{3}-\d{4}',
'internal_api': r'internal_key_[a-zA-Z0-9]{32}'
}
scanner = SecretSentry(custom_patterns=custom_patterns)
```
### **CI/CD Integration**
```python
#!/usr/bin/env python3
# security_check.py
import sys
from secretsentry import SecretSentry
def security_gate():
# Use ML-enhanced detection for better accuracy in CI/CD
scanner = SecretSentry(
use_ml_detection=True,
ml_ensemble_mode=True,
ml_confidence_threshold=0.8 # Higher threshold for CI/CD
)
findings = scanner.scan_directory(".", show_progress=False)
if findings:
print(f"β SECURITY CHECK FAILED: {len(findings)} secrets found")
# Show high-confidence ML findings first
if scanner.use_ml_detection:
ml_findings = scanner.get_ml_findings()
high_conf = scanner.get_high_confidence_findings(0.9)
print(f"π€ ML Analysis: {len(ml_findings)} ML findings, {len(high_conf)} high confidence")
scanner.display_findings(max_display=10)
return 1
else:
print("β
SECURITY CHECK PASSED: No secrets detected")
return 0
if __name__ == "__main__":
sys.exit(security_gate())
```
**CI/CD CLI Usage:**
```bash
# Basic CI/CD check
secretsentry scan . --ml --quiet || exit 1
# High-confidence only for sensitive deployments
secretsentry scan . --ml-only --ml-confidence 0.9 --quiet || exit 1
```
### **Batch Processing**
```python
# Scan multiple projects with ML
from secretsentry import SecretSentry
import os
projects = ["./frontend", "./backend", "./data-science"]
all_results = {}
for project in projects:
if os.path.exists(project):
# Use ML for better accuracy across different project types
scanner = SecretSentry(
use_ml_detection=True,
ml_ensemble_mode=True,
ml_confidence_threshold=0.7
)
findings = scanner.scan_directory(project)
# Collect ML statistics
ml_findings = scanner.get_ml_findings()
all_results[project] = {
'total_findings': len(findings),
'ml_findings': len(ml_findings),
'high_confidence': len(scanner.get_high_confidence_findings(0.8))
}
# Export detailed reports with ML metadata
scanner.export_findings(f"{project.replace('./', '')}_security_report.json")
print("Security Summary:", all_results)
```
## π **Detection Categories**
<details>
<summary><b>π API Keys & Secrets (20+ patterns)</b></summary>
- AWS Access/Secret Keys
- GitHub Tokens (classic & fine-grained)
- Google API Keys
- Stripe Keys (live & test)
- Slack Tokens & Webhooks
- SendGrid API Keys
- Twilio Keys
- Mailgun Keys
- Azure Storage Keys
- Heroku API Keys
- Generic API patterns
</details>
<details>
<summary><b>π³ Financial Data (8+ patterns)</b></summary>
- Credit Cards (Visa, MasterCard, AmEx, Discover, JCB, Diners)
- Bank Account Numbers
- Routing Numbers
- IBAN & SWIFT Codes
- Salary Information
</details>
<details>
<summary><b>π€ Personal Information (10+ patterns)</b></summary>
- Social Security Numbers
- Phone Numbers (US & International)
- Email Addresses
- Passport Numbers
- Driver's License Numbers
- Medical Record Numbers
</details>
<details>
<summary><b>π Geographic Data (5+ patterns)</b></summary>
- GPS Coordinates
- IP Addresses (IPv4 & IPv6)
- MAC Addresses
- ZIP/Postal Codes
</details>
<details>
<summary><b>π₯ Sensitive Personal Data (5+ patterns)</b></summary>
- Ethnic/Racial Categories
- Religious Affiliations
- Medical Information
- Disability Status
</details>
<details>
<summary><b>π Cryptographic Material (5+ patterns)</b></summary>
- Private Keys (RSA, SSH)
- Public Keys & Certificates
- JWT Tokens
- OAuth Tokens
</details>
## ποΈ **Configuration**
### **Environment Variables**
```bash
# Disable progress bars
export SECRETSENTRY_NO_PROGRESS=1
# Custom config file
export SECRETSENTRY_CONFIG=/path/to/config.json
# ML model cache directory (optional)
export SECRETSENTRY_MODEL_CACHE=/path/to/ml/models
# Force ML detection on/off
export SECRETSENTRY_USE_ML=true
export SECRETSENTRY_ML_CONFIDENCE=0.7
```
### **Configuration File**
```json
{
"excluded_patterns": ["test_", "example_", "demo_"],
"excluded_files": ["*.test.js", "test_*.py"],
"excluded_dirs": ["tests", "examples", "docs"],
"custom_patterns": {
"company_id": "COMP-\\d{8}"
},
"sanitization": {
"create_backups": true,
"backup_suffix": ".backup"
},
"ml_detection": {
"enabled": true,
"confidence_threshold": 0.7,
"ensemble_mode": true,
"use_transformers": false,
"model_cache_dir": "~/.cache/secretsentry/models"
}
}
```
## β‘ **Performance & Requirements**
### **ML Performance**
| Detection Mode | Speed | Accuracy | Memory Usage | Dependencies |
|---------------|-------|----------|--------------|--------------|
| **Regex Only** | β‘β‘β‘β‘β‘ | β
β
β
| π’ Low | Minimal |
| **ML Basic** | β‘β‘β‘β‘ | β
β
β
β
| π‘ Medium | scikit-learn |
| **ML Advanced** | β‘β‘β‘ | β
β
β
β
β
| π΄ High | transformers |
### **System Requirements**
**Minimum (Regex-only):**
- Python 3.7+
- 50MB RAM
- Any CPU
**Recommended (ML Basic):**
- Python 3.8+
- 512MB RAM
- 2+ CPU cores
- 200MB disk space
**Optimal (ML Advanced):**
- Python 3.9+
- 2GB+ RAM
- 4+ CPU cores
- 1GB disk space
### **Installation Time**
```bash
pip install secretsentry # ~30 seconds
pip install secretsentry[ml] # ~2 minutes
pip install secretsentry[ml-advanced] # ~5 minutes (downloads models)
```
### **First Run Performance**
- **Regex detection**: Instant
- **ML Basic**: ~30 seconds (model training on first run)
- **ML Advanced**: ~2 minutes (model download + training)
- **Subsequent runs**: Fast (models cached)
## π€ **Contributing**
We welcome contributions! Here's how to get started:
```bash
# Clone the repository
git clone https://github.com/yourusername/secretsentry.git
cd secretsentry
# Install development dependencies (includes ML dependencies)
pip install -e ".[full]"
pip install pytest black flake8
# Run tests (includes ML tests)
pytest tests/
# Test ML functionality specifically
python test_ml_detection.py
# Format code
black secretsentry/
flake8 secretsentry/
```
## π **License**
MIT License - see [LICENSE](LICENSE) file for details.
## π **Acknowledgments**
- Inspired by [detect-secrets](https://github.com/Yelp/detect-secrets) and [truffleHog](https://github.com/dxa4481/truffleHog)
- ML capabilities powered by [scikit-learn](https://scikit-learn.org/) and [Transformers](https://huggingface.co/transformers/)
- Built for the data science and security communities
- Special thanks to all contributors and the open source community
- Grateful to the broader AI/ML community for advancing secret detection research
## π **Support**
- π **Documentation**: [Full docs](https://github.com/yourusername/secretsentry#readme)
- π **Issues**: [Report bugs](https://github.com/yourusername/secretsentry/issues)
- π¬ **Discussions**: [Community forum](https://github.com/yourusername/secretsentry/discussions)
- π§ **Contact**: your.email@example.com
---
**SecretSentry** - *Standing guard over your sensitive data* π‘οΈ
Raw data
{
"_id": null,
"home_page": null,
"name": "secretsentry",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "security, secrets, scanner, pii, jupyter, notebook, api-keys, credentials, sanitization, privacy, devops, ci-cd, machine-learning, ml, transformers, false-positives",
"author": null,
"author_email": "Abdul Jilani <abdul.jilani@evolveailabs.com>",
"download_url": "https://files.pythonhosted.org/packages/e3/5f/169dfd0f2098cc4bb455aad8a3e53fa13689c2aa48ce95a87734d7623465/secretsentry-2.0.0.tar.gz",
"platform": null,
"description": "# SecretSentry \ud83d\udee1\ufe0f\n\n> **The first AI-powered sensitive data scanner built for modern data science and web development workflows**\n\n[](https://badge.fury.io/py/secretsentry)\n[](https://pypi.org/project/secretsentry/)\n[](https://opensource.org/licenses/MIT)\n\nSecretSentry is an advanced sensitive data scanner that goes beyond traditional secret detection. Built specifically for **Jupyter notebooks**, **web development**, and **data science workflows**, it combines **machine learning** with regex patterns to intelligently filter false positives while detecting API keys, PII, credentials, and other sensitive information.\n\n## \ud83c\udfaf **Why SecretSentry?**\n\n### **Built for Modern Workflows**\n- \ud83d\udd2c **Jupyter Notebook Specialist**: First scanner designed for `.ipynb` files\n- \ud83e\udd16 **AI-Powered Detection**: Machine learning models reduce false positives by up to 80%\n- \ud83e\udde0 **Smart Context Awareness**: Understands code context, not just pattern matching\n- \ud83c\udf10 **Multi-Environment**: CLI, Jupyter notebooks, and Python scripts\n- \ud83c\udf9b\ufe0f **Interactive Analysis**: Built-in widgets for exploring findings\n\n### **Comprehensive Detection**\n- \ud83d\udd11 **50+ Built-in Patterns**: API keys, tokens, secrets, credentials\n- \ud83d\udc64 **PII Detection**: SSNs, credit cards, phone numbers, emails\n- \ud83d\udcb0 **Financial Data**: Salary information, bank accounts, routing numbers\n- \ud83c\udf0d **Geographic Data**: Coordinates, IP addresses, postal codes\n- \ud83c\udfe5 **Sensitive Categories**: Ethnic data, religious information, medical records\n\n### **Advanced Features**\n- \ud83d\udee1\ufe0f **Smart Sanitization**: Context-aware gibberish replacement\n- \ud83e\udd16 **Ensemble Detection**: Combines regex + ML for maximum accuracy\n- \ud83d\udcca **Rich Visualizations**: Charts and statistics (with matplotlib/seaborn)\n- \ud83d\udcc8 **Pandas Integration**: Export to DataFrames for analysis\n- \ud83c\udfaf **Confidence Scoring**: ML predictions with 0.0-1.0 confidence scores\n- \ud83d\udd04 **CI/CD Ready**: Perfect for automation and pipelines\n- \ud83d\udda5\ufe0f **Cross-Platform**: Works on macOS, Windows, and Linux\n\n## \ud83d\ude80 **Quick Start**\n\n### **Installation**\n\n```bash\n# Basic installation (regex-only detection)\npip install secretsentry\n\n# With machine learning capabilities\npip install secretsentry[ml]\n\n# Advanced ML with transformers (best accuracy)\npip install secretsentry[ml-advanced]\n\n# Full installation with all features\npip install secretsentry[full]\n\n# For Jupyter notebooks only\npip install secretsentry[jupyter]\n```\n\n### **Basic Usage**\n\n```python\nfrom secretsentry import SecretSentry, quick_scan, quick_ml_scan\n\n# Quick scan with regex detection\nscanner = quick_scan(\"./my_project\")\n\n# Quick scan with AI/ML enhancement (recommended)\nscanner = quick_ml_scan(\"./my_project\", confidence_threshold=0.7)\n\n# Manual scanning with ML capabilities\nscanner = SecretSentry(\n use_ml_detection=True,\n ml_confidence_threshold=0.7,\n ml_ensemble_mode=True # Combines regex + ML\n)\nfindings = scanner.scan_directory(\"./my_project\")\nscanner.display_findings()\n\n# Access ML-specific results\nml_findings = scanner.get_ml_findings()\nhigh_confidence = scanner.get_high_confidence_findings(0.8)\n\n# Sanitize files (creates backups automatically)\nstats = scanner.sanitize_files(dry_run=True) # Preview changes\nstats = scanner.sanitize_files() # Actually sanitize\n```\n\n### **Command Line**\n\n```bash\n# Basic regex scanning\nsecretsentry scan ./my_project --display\n\n# AI-enhanced scanning (recommended)\nsecretsentry scan ./my_project --ml --display\n\n# Quick ML scan with optimal settings\nsecretsentry scan ./my_project --ml-quick\n\n# ML-only detection with custom confidence\nsecretsentry scan ./my_project --ml-only --ml-confidence 0.8\n\n# Check ML requirements\nsecretsentry scan --check-ml\n\n# Export findings with ML metadata\nsecretsentry scan ./my_project --ml --export findings.json\n\n# Sanitize files (with backup)\nsecretsentry scan ./my_project --sanitize --dry-run\nsecretsentry scan ./my_project --sanitize\n\n# List all detection patterns\nsecretsentry list-patterns\n```\n\n## \ud83e\udd16 **AI-Powered Detection**\n\nSecretSentry's machine learning capabilities provide **context-aware detection** that dramatically reduces false positives:\n\n### **ML Detection Modes**\n\n```python\n# Ensemble Mode (recommended): Combines regex + ML\nscanner = SecretSentry(\n use_ml_detection=True,\n ml_ensemble_mode=True,\n ml_confidence_threshold=0.7\n)\n\n# ML-Only Mode: Pure machine learning detection\nscanner = SecretSentry(\n use_ml_detection=True,\n ml_ensemble_mode=False,\n ml_confidence_threshold=0.8\n)\n\n# Quick ML scan with optimal settings\nscanner = quick_ml_scan(\"./my_project\")\n```\n\n### **ML Features**\n\n- \ud83e\udde0 **Context Understanding**: Analyzes surrounding code context, not just patterns\n- \ud83d\udcca **Confidence Scoring**: Every ML detection includes a 0.0-1.0 confidence score \n- \ud83d\udd2c **Feature Extraction**: Text entropy, keyword analysis, pattern recognition\n- \ud83c\udfcb\ufe0f **Multiple Models**: Logistic Regression, Isolation Forest, optional Transformers\n- \ud83d\udcbe **Model Caching**: Trained models cached for faster subsequent scans\n- \ud83d\udda5\ufe0f **Local Processing**: All ML inference happens on your machine (no data sent externally)\n\n### **ML Requirements**\n\n```bash\n# Check what's available on your system\nsecretsentry scan --check-ml\n\n# Install ML dependencies\npip install secretsentry[ml] # Basic ML (scikit-learn)\npip install secretsentry[ml-advanced] # Advanced ML (transformers)\n```\n\n## \ud83c\udf93 **Jupyter Notebook Integration**\n\nSecretSentry shines in Jupyter environments with **zero false positives** from notebook metadata:\n\n```python\n# In Jupyter notebook\nfrom secretsentry import quick_scan, quick_ml_scan\n\n# Quick ML scan with visualizations\nscanner = quick_ml_scan(\"./test_data\", show_plots=True)\n\n# Interactive exploration with ML metadata\nscanner.create_interactive_viewer()\n\n# Data analysis with ML findings\ndf = scanner.to_dataframe(include_ml_findings=True)\nsummary = df.groupby(['pattern_type', 'detection_method']).size()\n\n# Analyze confidence scores\nml_df = df[df['detection_method'] == 'ml']\nconfidence_analysis = ml_df['confidence_score'].describe()\n```\n\n## \ud83d\udcca **What Makes It Special**\n\n### **AI-Enhanced Accuracy**\n\n**Traditional regex scanners** flag these as secrets:\n```\n\u274c aws_secret_key: iVBORw0KGgoAAAANSUhEUgAABKYAAAMW... # Just a PNG image!\n\u274c api_key: \"cell_type\": \"code\" # Notebook metadata! \n\u274c secret: #3498db # CSS color!\n\u274c token: \"placeholder_for_testing\" # Test data!\n```\n\n**SecretSentry with ML** understands context and only reports **real secrets**:\n```\n\u2705 aws_secret_key: AKIAIOSFODNN7EXAMPLE (confidence: 0.95)\n\u2705 stripe_key: sk_live_1234567890abcdef123456789 (confidence: 0.89) \n\u2705 database_url: postgresql://user:password@localhost/db (confidence: 0.92)\n```\n\n**ML Advantages:**\n- \ud83c\udfaf **Context Awareness**: Understands surrounding code patterns\n- \ud83d\udcca **Confidence Scoring**: Know how certain each detection is\n- \ud83e\udde0 **Learning**: Improves over time with usage patterns\n- \ud83d\udee1\ufe0f **Adaptive**: Handles new secret formats without regex updates\n\n### **Smart Sanitization**\n\nSecretSentry doesn't just find secrets\u2014it **fixes them safely**:\n\n```python\n# Before sanitization\nAPI_KEY = \"sk_live_1234567890abcdef\"\nemployee_ssn = \"123-45-6789\"\ncoordinates = \"40.7128, -74.0060\"\n\n# After sanitization (context-aware gibberish)\nAPI_KEY = \"sk_live_xK8mP9nQ4vL7wR2Z\"\nemployee_ssn = \"456-78-9123\" \ncoordinates = \"38.8951, -77.0364\"\n```\n\n## \ud83d\udd27 **Advanced Usage**\n\n### **Custom Patterns**\n\n```python\n# Add organization-specific patterns\ncustom_patterns = {\n 'employee_id': r'EMP-\\d{6}',\n 'project_code': r'PROJ-[A-Z]{3}-\\d{4}',\n 'internal_api': r'internal_key_[a-zA-Z0-9]{32}'\n}\n\nscanner = SecretSentry(custom_patterns=custom_patterns)\n```\n\n### **CI/CD Integration**\n\n```python\n#!/usr/bin/env python3\n# security_check.py\nimport sys\nfrom secretsentry import SecretSentry\n\ndef security_gate():\n # Use ML-enhanced detection for better accuracy in CI/CD\n scanner = SecretSentry(\n use_ml_detection=True,\n ml_ensemble_mode=True,\n ml_confidence_threshold=0.8 # Higher threshold for CI/CD\n )\n findings = scanner.scan_directory(\".\", show_progress=False)\n \n if findings:\n print(f\"\u274c SECURITY CHECK FAILED: {len(findings)} secrets found\")\n \n # Show high-confidence ML findings first\n if scanner.use_ml_detection:\n ml_findings = scanner.get_ml_findings()\n high_conf = scanner.get_high_confidence_findings(0.9)\n print(f\"\ud83e\udd16 ML Analysis: {len(ml_findings)} ML findings, {len(high_conf)} high confidence\")\n \n scanner.display_findings(max_display=10)\n return 1\n else:\n print(\"\u2705 SECURITY CHECK PASSED: No secrets detected\")\n return 0\n\nif __name__ == \"__main__\":\n sys.exit(security_gate())\n```\n\n**CI/CD CLI Usage:**\n```bash\n# Basic CI/CD check\nsecretsentry scan . --ml --quiet || exit 1\n\n# High-confidence only for sensitive deployments \nsecretsentry scan . --ml-only --ml-confidence 0.9 --quiet || exit 1\n```\n\n### **Batch Processing**\n\n```python\n# Scan multiple projects with ML\nfrom secretsentry import SecretSentry\nimport os\n\nprojects = [\"./frontend\", \"./backend\", \"./data-science\"]\nall_results = {}\n\nfor project in projects:\n if os.path.exists(project):\n # Use ML for better accuracy across different project types\n scanner = SecretSentry(\n use_ml_detection=True,\n ml_ensemble_mode=True,\n ml_confidence_threshold=0.7\n )\n findings = scanner.scan_directory(project)\n \n # Collect ML statistics\n ml_findings = scanner.get_ml_findings()\n all_results[project] = {\n 'total_findings': len(findings),\n 'ml_findings': len(ml_findings),\n 'high_confidence': len(scanner.get_high_confidence_findings(0.8))\n }\n \n # Export detailed reports with ML metadata\n scanner.export_findings(f\"{project.replace('./', '')}_security_report.json\")\n\nprint(\"Security Summary:\", all_results)\n```\n\n## \ud83d\udcc8 **Detection Categories**\n\n<details>\n<summary><b>\ud83d\udd11 API Keys & Secrets (20+ patterns)</b></summary>\n\n- AWS Access/Secret Keys\n- GitHub Tokens (classic & fine-grained) \n- Google API Keys\n- Stripe Keys (live & test)\n- Slack Tokens & Webhooks\n- SendGrid API Keys\n- Twilio Keys\n- Mailgun Keys\n- Azure Storage Keys\n- Heroku API Keys\n- Generic API patterns\n\n</details>\n\n<details>\n<summary><b>\ud83d\udcb3 Financial Data (8+ patterns)</b></summary>\n\n- Credit Cards (Visa, MasterCard, AmEx, Discover, JCB, Diners)\n- Bank Account Numbers\n- Routing Numbers \n- IBAN & SWIFT Codes\n- Salary Information\n\n</details>\n\n<details>\n<summary><b>\ud83d\udc64 Personal Information (10+ patterns)</b></summary>\n\n- Social Security Numbers\n- Phone Numbers (US & International)\n- Email Addresses\n- Passport Numbers\n- Driver's License Numbers\n- Medical Record Numbers\n\n</details>\n\n<details>\n<summary><b>\ud83c\udf0d Geographic Data (5+ patterns)</b></summary>\n\n- GPS Coordinates\n- IP Addresses (IPv4 & IPv6)\n- MAC Addresses \n- ZIP/Postal Codes\n\n</details>\n\n<details>\n<summary><b>\ud83c\udfe5 Sensitive Personal Data (5+ patterns)</b></summary>\n\n- Ethnic/Racial Categories\n- Religious Affiliations \n- Medical Information\n- Disability Status\n\n</details>\n\n<details>\n<summary><b>\ud83d\udd10 Cryptographic Material (5+ patterns)</b></summary>\n\n- Private Keys (RSA, SSH)\n- Public Keys & Certificates\n- JWT Tokens\n- OAuth Tokens \n\n</details>\n\n## \ud83c\udf9b\ufe0f **Configuration**\n\n### **Environment Variables**\n```bash\n# Disable progress bars\nexport SECRETSENTRY_NO_PROGRESS=1\n\n# Custom config file\nexport SECRETSENTRY_CONFIG=/path/to/config.json\n\n# ML model cache directory (optional)\nexport SECRETSENTRY_MODEL_CACHE=/path/to/ml/models\n\n# Force ML detection on/off\nexport SECRETSENTRY_USE_ML=true\nexport SECRETSENTRY_ML_CONFIDENCE=0.7\n```\n\n### **Configuration File**\n```json\n{\n \"excluded_patterns\": [\"test_\", \"example_\", \"demo_\"],\n \"excluded_files\": [\"*.test.js\", \"test_*.py\"],\n \"excluded_dirs\": [\"tests\", \"examples\", \"docs\"],\n \"custom_patterns\": {\n \"company_id\": \"COMP-\\\\d{8}\"\n },\n \"sanitization\": {\n \"create_backups\": true,\n \"backup_suffix\": \".backup\"\n },\n \"ml_detection\": {\n \"enabled\": true,\n \"confidence_threshold\": 0.7,\n \"ensemble_mode\": true,\n \"use_transformers\": false,\n \"model_cache_dir\": \"~/.cache/secretsentry/models\"\n }\n}\n```\n\n## \u26a1 **Performance & Requirements**\n\n### **ML Performance**\n\n| Detection Mode | Speed | Accuracy | Memory Usage | Dependencies |\n|---------------|-------|----------|--------------|--------------|\n| **Regex Only** | \u26a1\u26a1\u26a1\u26a1\u26a1 | \u2705\u2705\u2705 | \ud83d\udfe2 Low | Minimal |\n| **ML Basic** | \u26a1\u26a1\u26a1\u26a1 | \u2705\u2705\u2705\u2705 | \ud83d\udfe1 Medium | scikit-learn |\n| **ML Advanced** | \u26a1\u26a1\u26a1 | \u2705\u2705\u2705\u2705\u2705 | \ud83d\udd34 High | transformers |\n\n### **System Requirements**\n\n**Minimum (Regex-only):**\n- Python 3.7+\n- 50MB RAM\n- Any CPU\n\n**Recommended (ML Basic):**\n- Python 3.8+\n- 512MB RAM\n- 2+ CPU cores\n- 200MB disk space\n\n**Optimal (ML Advanced):** \n- Python 3.9+\n- 2GB+ RAM\n- 4+ CPU cores\n- 1GB disk space\n\n### **Installation Time**\n\n```bash\npip install secretsentry # ~30 seconds\npip install secretsentry[ml] # ~2 minutes \npip install secretsentry[ml-advanced] # ~5 minutes (downloads models)\n```\n\n### **First Run Performance**\n\n- **Regex detection**: Instant\n- **ML Basic**: ~30 seconds (model training on first run)\n- **ML Advanced**: ~2 minutes (model download + training)\n- **Subsequent runs**: Fast (models cached)\n\n## \ud83e\udd1d **Contributing**\n\nWe welcome contributions! Here's how to get started:\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/secretsentry.git\ncd secretsentry\n\n# Install development dependencies (includes ML dependencies)\npip install -e \".[full]\"\npip install pytest black flake8\n\n# Run tests (includes ML tests)\npytest tests/\n\n# Test ML functionality specifically\npython test_ml_detection.py\n\n# Format code\nblack secretsentry/\nflake8 secretsentry/\n```\n\n## \ud83d\udcdd **License**\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f **Acknowledgments**\n\n- Inspired by [detect-secrets](https://github.com/Yelp/detect-secrets) and [truffleHog](https://github.com/dxa4481/truffleHog)\n- ML capabilities powered by [scikit-learn](https://scikit-learn.org/) and [Transformers](https://huggingface.co/transformers/)\n- Built for the data science and security communities\n- Special thanks to all contributors and the open source community\n- Grateful to the broader AI/ML community for advancing secret detection research\n\n## \ud83d\udcde **Support**\n\n- \ud83d\udcd6 **Documentation**: [Full docs](https://github.com/yourusername/secretsentry#readme)\n- \ud83d\udc1b **Issues**: [Report bugs](https://github.com/yourusername/secretsentry/issues)\n- \ud83d\udcac **Discussions**: [Community forum](https://github.com/yourusername/secretsentry/discussions)\n- \ud83d\udce7 **Contact**: your.email@example.com\n\n---\n\n**SecretSentry** - *Standing guard over your sensitive data* \ud83d\udee1\ufe0f\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Advanced sensitive data scanner with Jupyter notebook support and intelligent false positive filtering",
"version": "2.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/y2ee201/secretsentry/issues",
"Documentation": "https://github.com/y2ee201/secretsentry#readme",
"Homepage": "https://github.com/y2ee201/secretsentry",
"Repository": "https://github.com/y2ee201/secretsentry.git"
},
"split_keywords": [
"security",
" secrets",
" scanner",
" pii",
" jupyter",
" notebook",
" api-keys",
" credentials",
" sanitization",
" privacy",
" devops",
" ci-cd",
" machine-learning",
" ml",
" transformers",
" false-positives"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3ee51c5d8ac3855e28950b82926a58afd934f698ecf1ef07068f17d1058d02cb",
"md5": "c78b67e5d08b3d0f31dbe3ad6b5a6bf6",
"sha256": "694f2b82f26d10096b4e2cb0c1322e075c10a696bd65abac8b804cc2b5cdc708"
},
"downloads": -1,
"filename": "secretsentry-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c78b67e5d08b3d0f31dbe3ad6b5a6bf6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 32338,
"upload_time": "2025-08-08T09:33:08",
"upload_time_iso_8601": "2025-08-08T09:33:08.309878Z",
"url": "https://files.pythonhosted.org/packages/3e/e5/1c5d8ac3855e28950b82926a58afd934f698ecf1ef07068f17d1058d02cb/secretsentry-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e35f169dfd0f2098cc4bb455aad8a3e53fa13689c2aa48ce95a87734d7623465",
"md5": "ed28722cc75ad03a61bec12fc689e384",
"sha256": "556070798c435564ee69e210dc27482b71aa776114f24df027b334ba174b0f69"
},
"downloads": -1,
"filename": "secretsentry-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "ed28722cc75ad03a61bec12fc689e384",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 40041,
"upload_time": "2025-08-08T09:33:09",
"upload_time_iso_8601": "2025-08-08T09:33:09.676078Z",
"url": "https://files.pythonhosted.org/packages/e3/5f/169dfd0f2098cc4bb455aad8a3e53fa13689c2aa48ce95a87734d7623465/secretsentry-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-08 09:33:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "y2ee201",
"github_project": "secretsentry",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "tqdm",
"specs": [
[
">=",
"4.62.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.19.0"
]
]
}
],
"lcname": "secretsentry"
}