# Inconnu
## What is Inconnu?
Inconnu is a GDPR-compliant data privacy tool designed for entity redaction and de-anonymization. It provides cutting-edge NLP-based tools for anonymizing and pseudonymizing text data while maintaining data utility, ensuring your business meets stringent privacy regulations.
## Why Inconnu?
1. **Seamless Compliance**: Inconnu simplifies the complexity of GDPR and other privacy laws, making sure your data handling practices are always in line with legal standards.
2. **State-of-the-Art NLP**: Utilizing advanced spaCy models and custom entity recognition, Inconnu ensures that personal identifiers are completely detected and properly handled.
3. **Transparency and Trust**: Complete processing documentation with timestamping, hashing, and entity mapping for full audit trails.
4. **Reversible Processing**: Support for both anonymization and pseudonymization with complete de-anonymization capabilities.
5. **Performance Optimized**: Fast processing with singleton pattern optimization and configurable text length limits.
## Installation
### Prerequisites
- Python 3.10 or higher
- pip (Python package manager)
### Install from PyPI
```bash
# Basic installation (without language models)
pip install inconnu
# Install with English language support
pip install inconnu[en]
# Install with specific language support
pip install inconnu[de] # German
pip install inconnu[fr] # French
pip install inconnu[es] # Spanish
pip install inconnu[it] # Italian
# Install with multiple languages
pip install inconnu[en,de,fr]
# Install with all language support
pip install inconnu[all]
```
### Download Language Models
After installation, download the required spaCy models:
```bash
# Using the built-in CLI tool
inconnu-download en # Download default English model
inconnu-download de fr # Download German and French models
inconnu-download en --size large # Download large English model
inconnu-download all # Download all default models
inconnu-download --list # List all available models
# Or using spaCy directly
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
```
### Install from Source
1. **Clone the repository**:
```bash
git clone https://github.com/0xjgv/inconnu.git
cd inconnu
```
2. **Install with UV (recommended for development)**:
```bash
make install # Install dependencies
make model-de # Download German model
make test # Run tests
```
3. **Or install with pip**:
```bash
pip install -e . # Install in editable mode
python -m spacy download en_core_web_sm
```
### Installing Additional Models
Inconnu supports multiple spaCy models for enhanced accuracy. The default `en_core_web_sm` model is lightweight and fast, but you can install more accurate models:
#### English Models
```bash
# Small model (default) - 15MB, fast processing
uv run python -m spacy download en_core_web_sm
# Large model - 560MB, higher accuracy
uv run python -m spacy download en_core_web_lg
# Transformer model - 438MB, highest accuracy
uv run python -m spacy download en_core_web_trf
```
#### Additional Language Models
```bash
# German model
make model-de
uv run python -m spacy download de_core_news_sm
# Italian model
make model-it
uv run python -m spacy download it_core_news_sm
# Spanish model
make model-es
uv run python -m spacy download es_core_news_sm
# French model
make model-fr
uv run python -m spacy download fr_core_news_sm
# For enhanced accuracy (manual installation)
# Medium German model - better accuracy
uv run python -m spacy download de_core_news_md
# Large German model - highest accuracy
uv run python -m spacy download de_core_news_lg
```
#### Using Different Models
To use a different model, specify it when initializing the EntityRedactor:
```python
from inconnu.nlp.entity_redactor import EntityRedactor, SpacyModels
# Use transformer model for highest accuracy
entity_redactor = EntityRedactor(
custom_components=None,
language="en",
model_name=SpacyModels.EN_CORE_WEB_TRF # High accuracy transformer model
)
```
**Model Selection Guide:**
- `en_core_web_sm`: Fast processing, good for high-volume processing
- `en_core_web_lg`: Better accuracy, moderate processing time
- `en_core_web_trf`: Highest accuracy, slower processing (recommended for sensitive data)
For more models, visit the [spaCy Models Directory](https://spacy.io/models).
## Development Setup
### Available Commands
```bash
# Development workflow
make install # Install all dependencies
make model-de # Download German spaCy model
make model-it # Download Italian spaCy model
make model-es # Download Spanish spaCy model
make model-fr # Download French spaCy model
make test # Run full test suite
make lint # Check code with ruff
make format # Format code with ruff
make fix # Auto-fix linting issues
make clean # Format, lint, fix, and clean cache
make update-deps # Update dependencies
```
### Running Tests
```bash
# Run all tests
make test
# Run with verbose output
uv run pytest -vv
# Run specific test file
uv run pytest tests/test_inconnu.py -vv
# Run specific test class
uv run pytest tests/test_inconnu.py::TestInconnuPseudonymizer -vv
```
## Usage Examples
### Basic Text Anonymization
```python
from inconnu import Inconnu
# Simple initialization - no Config class required!
inconnu = Inconnu() # Uses sensible defaults
# Simple anonymization - just the redacted text
text = "John Doe from New York visited Paris last summer."
redacted = inconnu.redact(text)
print(redacted)
# Output: "[PERSON] from [GPE] visited [GPE] [DATE]."
# Pseudonymization - get both redacted text and entity mapping
redacted_text, entity_map = inconnu.pseudonymize(text)
print(redacted_text)
# Output: "[PERSON_0] from [GPE_0] visited [GPE_1] [DATE_0]."
print(entity_map)
# Output: {'[PERSON_0]': 'John Doe', '[GPE_0]': 'New York', '[GPE_1]': 'Paris', '[DATE_0]': 'last summer'}
# Advanced usage with full metadata (original API)
result = inconnu(text=text)
print(result.redacted_text)
print(f"Processing time: {result.processing_time_ms:.2f}ms")
```
### Async and Batch Processing
```python
import asyncio
# Async processing for non-blocking operations
async def process_texts():
inconnu = Inconnu()
# Single async processing
text = "John Doe called from +1-555-123-4567"
redacted = await inconnu.redact_async(text)
print(redacted) # "[PERSON] called from [PHONE_NUMBER]"
# Batch async processing
texts = [
"Alice Smith visited Berlin",
"Bob Jones went to Tokyo",
"Carol Brown lives in Paris"
]
results = await inconnu.redact_batch_async(texts)
for result in results:
print(result)
asyncio.run(process_texts())
```
### Customer Service Email Processing
```python
# Process customer service email with personal data
customer_email = """
Dear SolarTech Team,
I am Max Mustermann living at Hauptstraße 50, 80331 Munich, Germany.
My phone number is +49 89 1234567 and my email is max@example.com.
I need to return my solar modules (Order: ST-78901) due to relocation.
Best regards,
Max Mustermann
"""
# Simple redaction
redacted = inconnu.redact(customer_email)
print(redacted)
# Personal identifiers are automatically detected and redacted
```
### Multi-language Support
```python
# German language processing - simplified!
inconnu_de = Inconnu("de") # Just specify the language
german_text = "Herr Schmidt aus München besuchte Berlin im März."
redacted = inconnu_de.redact(german_text)
print(redacted)
# Output: "[PERSON] aus [GPE] besuchte [GPE] [DATE]."
```
### Custom Entity Recognition
```python
from inconnu import Inconnu, NERComponent
import re
# Add custom entity recognition
custom_components = [
NERComponent(
label="CREDIT_CARD",
pattern=re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'),
processing_function=None
)
]
# Simple initialization with custom components
inconnu_custom = Inconnu(
language="en",
custom_components=custom_components
)
# Test custom entity detection
text = "My card number is 1234 5678 9012 3456"
redacted = inconnu_custom.redact(text)
print(redacted) # "My card number is [CREDIT_CARD]"
```
### Context Manager for Resource Management
```python
# Automatic resource cleanup
with Inconnu() as inc:
redacted = inc.redact("Sensitive data about John Doe")
print(redacted)
# Resources automatically cleaned up
```
### Error Handling
```python
from inconnu import Inconnu, TextTooLongError, ProcessingError
inconnu = Inconnu(max_text_length=100) # Set small limit for demo
try:
long_text = "x" * 200 # Exceeds limit
result = inconnu.redact(long_text)
except TextTooLongError as e:
print(f"Text too long: {e}")
# Error includes helpful suggestions for resolution
except ProcessingError as e:
print(f"Processing failed: {e}")
```
## Use Cases
### 1. **Customer Support Systems**
Automatically redact personal information from customer service emails, chat logs, and support tickets while maintaining context for analysis.
### 2. **Legal Document Processing**
Anonymize legal documents, contracts, and case files for training, analysis, or public release while ensuring GDPR compliance.
### 3. **Medical Record Anonymization**
Process medical records and research data to remove patient identifiers while preserving clinical information for research purposes.
### 4. **Financial Transaction Analysis**
Redact personal financial information from transaction logs and banking communications for fraud analysis and compliance reporting.
### 5. **Survey and Feedback Analysis**
Anonymize customer feedback, survey responses, and user-generated content for analysis while protecting respondent privacy.
### 6. **Training Data Preparation**
Prepare training datasets for machine learning models by removing personal identifiers from text data while maintaining semantic meaning.
## Supported Entity Types
- **Standard Entities**: PERSON, GPE (locations), DATE, ORG, MONEY
- **Custom Entities**: EMAIL, IBAN, PHONE_NUMBER
- **Enhanced Detection**: Person titles (Dr, Mr, Ms), international phone numbers
- **Multilingual**: English, German, Italian, Spanish, and French language support
## Features
- **Robust Entity Detection**: Advanced NLP with spaCy models and custom regex patterns
- **Dual Processing Modes**: Anonymization (`[PERSON]`) and pseudonymization (`[PERSON_0]`)
- **Complete Audit Trail**: Timestamping, hashing, and processing metadata
- **Reversible Processing**: Full de-anonymization capabilities with entity mapping
- **Performance Optimized**: Singleton pattern for model loading, configurable limits
- **GDPR Compliant**: Built-in data retention policies and compliance features
## Contributing
We welcome contributions to Inconnu! As an open source project, we believe in the power of community collaboration to build better privacy tools.
### How to Contribute
#### 1. **Bug Reports & Feature Requests**
- Open an issue on GitHub with detailed descriptions
- Include code examples and expected vs actual behavior
- Tag issues appropriately (bug, enhancement, documentation)
#### 2. **Code Contributions**
```bash
# Fork the repository and create a feature branch
git checkout -b feature/your-feature-name
# Make your changes and ensure tests pass
make test
make lint
# Submit a pull request with:
# - Clear description of changes
# - Test coverage for new features
# - Updated documentation if needed
```
#### 3. **Development Guidelines**
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation for user-facing changes
- Ensure GDPR compliance considerations are addressed
#### 4. **Areas for Contribution**
- **Language Support**: Add new language models and region-specific entity detection
- **Custom Entities**: Implement detection for industry-specific identifiers
- **Performance**: Optimize processing speed and memory usage
- **Documentation**: Improve examples, tutorials, and API documentation
- **Testing**: Expand test coverage and edge case handling
#### 5. **Code Review Process**
- All contributions require code review
- Automated tests must pass
- Documentation updates are appreciated
- Maintain backward compatibility when possible
### Community Guidelines
- **Be Respectful**: Foster an inclusive environment for all contributors
- **Privacy First**: Always consider privacy implications of changes
- **Security Minded**: Report security issues privately before public disclosure
- **Quality Focused**: Prioritize code quality and comprehensive testing
### Getting Help
- **Discussions**: Use GitHub Discussions for questions and ideas
- **Issues**: Report bugs and request features through GitHub Issues
- **Documentation**: Check existing docs and contribute improvements
Thank you for helping make Inconnu a better tool for data privacy and GDPR compliance!
## Publishing to PyPI
### For Maintainers
To publish a new version to PyPI:
1. **Configure Trusted Publisher** (first time only):
- Go to https://pypi.org/manage/project/inconnu/settings/publishing/
- Add a new trusted publisher:
- Publisher: GitHub
- Organization/username: `0xjgv`
- Repository name: `inconnu`
- Workflow name: `publish.yml`
- Environment name: `pypi` (optional but recommended)
- For Test PyPI, do the same at https://test.pypi.org with environment name: `testpypi`
2. **Update Version**: Update the version in `pyproject.toml` and `inconnu/__init__.py`
3. **Create a Git Tag**:
```bash
git tag v0.1.0
git push origin v0.1.0
```
4. **GitHub Actions**: The workflow will automatically:
- Run tests on Python 3.10, 3.11, and 3.12
- Build the package
- Publish to PyPI using Trusted Publisher (no API tokens needed!)
- Generate PEP 740 attestations for security
5. **Test PyPI Publishing**:
- Use workflow_dispatch to manually trigger Test PyPI publishing
- Go to Actions → Publish to PyPI → Run workflow
### Manual Publishing (if needed)
```bash
# Build the package
uv build
# Check the package
twine check dist/*
# Upload to Test PyPI (requires API token)
twine upload --repository testpypi dist/*
# Upload to PyPI (requires API token)
twine upload dist/*
```
### GitHub Environments (Recommended)
Configure GitHub environments for additional security:
1. Go to Settings → Environments
2. Create `pypi` and `testpypi` environments
3. Add protection rules:
- Required reviewers
- Restrict to specific tags (e.g., `v*`)
- Add deployment branch restrictions
## Additional Resources
- [spaCy Models Directory](https://spacy.io/models) - Complete list of available language models
- [spaCy Model Releases](https://github.com/explosion/spacy-models) - GitHub repository for model updates
- [pgeocode](https://pypi.org/project/pgeocode/) - Geographic location processing (potential future integration)
Raw data
{
"_id": null,
"home_page": null,
"name": "inconnu",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "anonymization, gdpr, nlp, pii, privacy, pseudonymization, redaction, spacy",
"author": null,
"author_email": "0xjgv <juans.gaitan@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/52/43/4b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc/inconnu-0.1.0.tar.gz",
"platform": null,
"description": "# Inconnu\n\n## What is Inconnu?\n\nInconnu is a GDPR-compliant data privacy tool designed for entity redaction and de-anonymization. It provides cutting-edge NLP-based tools for anonymizing and pseudonymizing text data while maintaining data utility, ensuring your business meets stringent privacy regulations.\n\n## Why Inconnu?\n\n1. **Seamless Compliance**: Inconnu simplifies the complexity of GDPR and other privacy laws, making sure your data handling practices are always in line with legal standards.\n\n2. **State-of-the-Art NLP**: Utilizing advanced spaCy models and custom entity recognition, Inconnu ensures that personal identifiers are completely detected and properly handled.\n\n3. **Transparency and Trust**: Complete processing documentation with timestamping, hashing, and entity mapping for full audit trails.\n\n4. **Reversible Processing**: Support for both anonymization and pseudonymization with complete de-anonymization capabilities.\n\n5. **Performance Optimized**: Fast processing with singleton pattern optimization and configurable text length limits.\n\n## Installation\n\n### Prerequisites\n\n- Python 3.10 or higher\n- pip (Python package manager)\n\n### Install from PyPI\n\n```bash\n# Basic installation (without language models)\npip install inconnu\n\n# Install with English language support\npip install inconnu[en]\n\n# Install with specific language support\npip install inconnu[de] # German\npip install inconnu[fr] # French\npip install inconnu[es] # Spanish\npip install inconnu[it] # Italian\n\n# Install with multiple languages\npip install inconnu[en,de,fr]\n\n# Install with all language support\npip install inconnu[all]\n```\n\n### Download Language Models\n\nAfter installation, download the required spaCy models:\n\n```bash\n# Using the built-in CLI tool\ninconnu-download en # Download default English model\ninconnu-download de fr # Download German and French models\ninconnu-download en --size large # Download large English model\ninconnu-download all # Download all default models\ninconnu-download --list # List all available models\n\n# Or using spaCy directly\npython -m spacy download en_core_web_sm\npython -m spacy download de_core_news_sm\n```\n\n### Install from Source\n\n1. **Clone the repository**:\n ```bash\n git clone https://github.com/0xjgv/inconnu.git\n cd inconnu\n ```\n\n2. **Install with UV (recommended for development)**:\n ```bash\n make install # Install dependencies\n make model-de # Download German model\n make test # Run tests\n ```\n\n3. **Or install with pip**:\n ```bash\n pip install -e . # Install in editable mode\n python -m spacy download en_core_web_sm\n ```\n\n### Installing Additional Models\n\nInconnu supports multiple spaCy models for enhanced accuracy. The default `en_core_web_sm` model is lightweight and fast, but you can install more accurate models:\n\n#### English Models\n```bash\n# Small model (default) - 15MB, fast processing\nuv run python -m spacy download en_core_web_sm\n\n# Large model - 560MB, higher accuracy\nuv run python -m spacy download en_core_web_lg\n\n# Transformer model - 438MB, highest accuracy\nuv run python -m spacy download en_core_web_trf\n```\n\n#### Additional Language Models\n```bash\n# German model\nmake model-de\nuv run python -m spacy download de_core_news_sm\n\n# Italian model\nmake model-it\nuv run python -m spacy download it_core_news_sm\n\n# Spanish model\nmake model-es\nuv run python -m spacy download es_core_news_sm\n\n# French model\nmake model-fr\nuv run python -m spacy download fr_core_news_sm\n\n# For enhanced accuracy (manual installation)\n# Medium German model - better accuracy\nuv run python -m spacy download de_core_news_md\n\n# Large German model - highest accuracy\nuv run python -m spacy download de_core_news_lg\n```\n\n#### Using Different Models\n\nTo use a different model, specify it when initializing the EntityRedactor:\n\n```python\nfrom inconnu.nlp.entity_redactor import EntityRedactor, SpacyModels\n\n# Use transformer model for highest accuracy\nentity_redactor = EntityRedactor(\n custom_components=None,\n language=\"en\",\n model_name=SpacyModels.EN_CORE_WEB_TRF # High accuracy transformer model\n)\n```\n\n**Model Selection Guide:**\n- `en_core_web_sm`: Fast processing, good for high-volume processing\n- `en_core_web_lg`: Better accuracy, moderate processing time\n- `en_core_web_trf`: Highest accuracy, slower processing (recommended for sensitive data)\n\nFor more models, visit the [spaCy Models Directory](https://spacy.io/models).\n\n## Development Setup\n\n### Available Commands\n\n```bash\n# Development workflow\nmake install # Install all dependencies\nmake model-de # Download German spaCy model\nmake model-it # Download Italian spaCy model\nmake model-es # Download Spanish spaCy model\nmake model-fr # Download French spaCy model\nmake test # Run full test suite\nmake lint # Check code with ruff\nmake format # Format code with ruff\nmake fix # Auto-fix linting issues\nmake clean # Format, lint, fix, and clean cache\nmake update-deps # Update dependencies\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nmake test\n\n# Run with verbose output\nuv run pytest -vv\n\n# Run specific test file\nuv run pytest tests/test_inconnu.py -vv\n\n# Run specific test class\nuv run pytest tests/test_inconnu.py::TestInconnuPseudonymizer -vv\n```\n\n## Usage Examples\n\n### Basic Text Anonymization\n\n```python\nfrom inconnu import Inconnu\n\n# Simple initialization - no Config class required!\ninconnu = Inconnu() # Uses sensible defaults\n\n# Simple anonymization - just the redacted text\ntext = \"John Doe from New York visited Paris last summer.\"\nredacted = inconnu.redact(text)\nprint(redacted)\n# Output: \"[PERSON] from [GPE] visited [GPE] [DATE].\"\n\n# Pseudonymization - get both redacted text and entity mapping\nredacted_text, entity_map = inconnu.pseudonymize(text)\nprint(redacted_text)\n# Output: \"[PERSON_0] from [GPE_0] visited [GPE_1] [DATE_0].\"\nprint(entity_map)\n# Output: {'[PERSON_0]': 'John Doe', '[GPE_0]': 'New York', '[GPE_1]': 'Paris', '[DATE_0]': 'last summer'}\n\n# Advanced usage with full metadata (original API)\nresult = inconnu(text=text)\nprint(result.redacted_text)\nprint(f\"Processing time: {result.processing_time_ms:.2f}ms\")\n```\n\n### Async and Batch Processing\n\n```python\nimport asyncio\n\n# Async processing for non-blocking operations\nasync def process_texts():\n inconnu = Inconnu()\n\n # Single async processing\n text = \"John Doe called from +1-555-123-4567\"\n redacted = await inconnu.redact_async(text)\n print(redacted) # \"[PERSON] called from [PHONE_NUMBER]\"\n\n # Batch async processing\n texts = [\n \"Alice Smith visited Berlin\",\n \"Bob Jones went to Tokyo\",\n \"Carol Brown lives in Paris\"\n ]\n results = await inconnu.redact_batch_async(texts)\n for result in results:\n print(result)\n\nasyncio.run(process_texts())\n```\n\n### Customer Service Email Processing\n\n```python\n# Process customer service email with personal data\ncustomer_email = \"\"\"\nDear SolarTech Team,\n\nI am Max Mustermann living at Hauptstra\u00dfe 50, 80331 Munich, Germany.\nMy phone number is +49 89 1234567 and my email is max@example.com.\nI need to return my solar modules (Order: ST-78901) due to relocation.\n\nBest regards,\nMax Mustermann\n\"\"\"\n\n# Simple redaction\nredacted = inconnu.redact(customer_email)\nprint(redacted)\n# Personal identifiers are automatically detected and redacted\n```\n\n### Multi-language Support\n\n```python\n# German language processing - simplified!\ninconnu_de = Inconnu(\"de\") # Just specify the language\n\ngerman_text = \"Herr Schmidt aus M\u00fcnchen besuchte Berlin im M\u00e4rz.\"\nredacted = inconnu_de.redact(german_text)\nprint(redacted)\n# Output: \"[PERSON] aus [GPE] besuchte [GPE] [DATE].\"\n```\n\n### Custom Entity Recognition\n\n```python\nfrom inconnu import Inconnu, NERComponent\nimport re\n\n# Add custom entity recognition\ncustom_components = [\n NERComponent(\n label=\"CREDIT_CARD\",\n pattern=re.compile(r'\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b'),\n processing_function=None\n )\n]\n\n# Simple initialization with custom components\ninconnu_custom = Inconnu(\n language=\"en\",\n custom_components=custom_components\n)\n\n# Test custom entity detection\ntext = \"My card number is 1234 5678 9012 3456\"\nredacted = inconnu_custom.redact(text)\nprint(redacted) # \"My card number is [CREDIT_CARD]\"\n```\n\n### Context Manager for Resource Management\n\n```python\n# Automatic resource cleanup\nwith Inconnu() as inc:\n redacted = inc.redact(\"Sensitive data about John Doe\")\n print(redacted)\n# Resources automatically cleaned up\n```\n\n### Error Handling\n\n```python\nfrom inconnu import Inconnu, TextTooLongError, ProcessingError\n\ninconnu = Inconnu(max_text_length=100) # Set small limit for demo\n\ntry:\n long_text = \"x\" * 200 # Exceeds limit\n result = inconnu.redact(long_text)\nexcept TextTooLongError as e:\n print(f\"Text too long: {e}\")\n # Error includes helpful suggestions for resolution\nexcept ProcessingError as e:\n print(f\"Processing failed: {e}\")\n```\n\n## Use Cases\n\n### 1. **Customer Support Systems**\nAutomatically redact personal information from customer service emails, chat logs, and support tickets while maintaining context for analysis.\n\n### 2. **Legal Document Processing**\nAnonymize legal documents, contracts, and case files for training, analysis, or public release while ensuring GDPR compliance.\n\n### 3. **Medical Record Anonymization**\nProcess medical records and research data to remove patient identifiers while preserving clinical information for research purposes.\n\n### 4. **Financial Transaction Analysis**\nRedact personal financial information from transaction logs and banking communications for fraud analysis and compliance reporting.\n\n### 5. **Survey and Feedback Analysis**\nAnonymize customer feedback, survey responses, and user-generated content for analysis while protecting respondent privacy.\n\n### 6. **Training Data Preparation**\nPrepare training datasets for machine learning models by removing personal identifiers from text data while maintaining semantic meaning.\n\n## Supported Entity Types\n\n- **Standard Entities**: PERSON, GPE (locations), DATE, ORG, MONEY\n- **Custom Entities**: EMAIL, IBAN, PHONE_NUMBER\n- **Enhanced Detection**: Person titles (Dr, Mr, Ms), international phone numbers\n- **Multilingual**: English, German, Italian, Spanish, and French language support\n\n## Features\n\n- **Robust Entity Detection**: Advanced NLP with spaCy models and custom regex patterns\n- **Dual Processing Modes**: Anonymization (`[PERSON]`) and pseudonymization (`[PERSON_0]`)\n- **Complete Audit Trail**: Timestamping, hashing, and processing metadata\n- **Reversible Processing**: Full de-anonymization capabilities with entity mapping\n- **Performance Optimized**: Singleton pattern for model loading, configurable limits\n- **GDPR Compliant**: Built-in data retention policies and compliance features\n\n## Contributing\n\nWe welcome contributions to Inconnu! As an open source project, we believe in the power of community collaboration to build better privacy tools.\n\n### How to Contribute\n\n#### 1. **Bug Reports & Feature Requests**\n- Open an issue on GitHub with detailed descriptions\n- Include code examples and expected vs actual behavior\n- Tag issues appropriately (bug, enhancement, documentation)\n\n#### 2. **Code Contributions**\n```bash\n# Fork the repository and create a feature branch\ngit checkout -b feature/your-feature-name\n\n# Make your changes and ensure tests pass\nmake test\nmake lint\n\n# Submit a pull request with:\n# - Clear description of changes\n# - Test coverage for new features\n# - Updated documentation if needed\n```\n\n#### 3. **Development Guidelines**\n- Follow existing code style and patterns\n- Add tests for new functionality\n- Update documentation for user-facing changes\n- Ensure GDPR compliance considerations are addressed\n\n#### 4. **Areas for Contribution**\n- **Language Support**: Add new language models and region-specific entity detection\n- **Custom Entities**: Implement detection for industry-specific identifiers\n- **Performance**: Optimize processing speed and memory usage\n- **Documentation**: Improve examples, tutorials, and API documentation\n- **Testing**: Expand test coverage and edge case handling\n\n#### 5. **Code Review Process**\n- All contributions require code review\n- Automated tests must pass\n- Documentation updates are appreciated\n- Maintain backward compatibility when possible\n\n### Community Guidelines\n\n- **Be Respectful**: Foster an inclusive environment for all contributors\n- **Privacy First**: Always consider privacy implications of changes\n- **Security Minded**: Report security issues privately before public disclosure\n- **Quality Focused**: Prioritize code quality and comprehensive testing\n\n### Getting Help\n\n- **Discussions**: Use GitHub Discussions for questions and ideas\n- **Issues**: Report bugs and request features through GitHub Issues\n- **Documentation**: Check existing docs and contribute improvements\n\nThank you for helping make Inconnu a better tool for data privacy and GDPR compliance!\n\n## Publishing to PyPI\n\n### For Maintainers\n\nTo publish a new version to PyPI:\n\n1. **Configure Trusted Publisher** (first time only):\n - Go to https://pypi.org/manage/project/inconnu/settings/publishing/\n - Add a new trusted publisher:\n - Publisher: GitHub\n - Organization/username: `0xjgv`\n - Repository name: `inconnu`\n - Workflow name: `publish.yml`\n - Environment name: `pypi` (optional but recommended)\n - For Test PyPI, do the same at https://test.pypi.org with environment name: `testpypi`\n\n2. **Update Version**: Update the version in `pyproject.toml` and `inconnu/__init__.py`\n\n3. **Create a Git Tag**:\n ```bash\n git tag v0.1.0\n git push origin v0.1.0\n ```\n\n4. **GitHub Actions**: The workflow will automatically:\n - Run tests on Python 3.10, 3.11, and 3.12\n - Build the package\n - Publish to PyPI using Trusted Publisher (no API tokens needed!)\n - Generate PEP 740 attestations for security\n\n5. **Test PyPI Publishing**:\n - Use workflow_dispatch to manually trigger Test PyPI publishing\n - Go to Actions \u2192 Publish to PyPI \u2192 Run workflow\n\n### Manual Publishing (if needed)\n\n```bash\n# Build the package\nuv build\n\n# Check the package\ntwine check dist/*\n\n# Upload to Test PyPI (requires API token)\ntwine upload --repository testpypi dist/*\n\n# Upload to PyPI (requires API token)\ntwine upload dist/*\n```\n\n### GitHub Environments (Recommended)\n\nConfigure GitHub environments for additional security:\n1. Go to Settings \u2192 Environments\n2. Create `pypi` and `testpypi` environments\n3. Add protection rules:\n - Required reviewers\n - Restrict to specific tags (e.g., `v*`)\n - Add deployment branch restrictions\n\n## Additional Resources\n\n- [spaCy Models Directory](https://spacy.io/models) - Complete list of available language models\n- [spaCy Model Releases](https://github.com/explosion/spacy-models) - GitHub repository for model updates\n- [pgeocode](https://pypi.org/project/pgeocode/) - Geographic location processing (potential future integration)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "GDPR-compliant data privacy tool for entity redaction and de-anonymization",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/0xjgv/inconnu#readme",
"Homepage": "https://github.com/0xjgv/inconnu",
"Issues": "https://github.com/0xjgv/inconnu/issues",
"Repository": "https://github.com/0xjgv/inconnu"
},
"split_keywords": [
"anonymization",
" gdpr",
" nlp",
" pii",
" privacy",
" pseudonymization",
" redaction",
" spacy"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "67fe74da14c8b5b13f68c6007d4acb5a6d13511df8a22b0422ff60de68afcd55",
"md5": "fa5102ea2a7a6fd24819f2d43e252a7f",
"sha256": "ab85645c07ec794cd47522230357f660ce6bbf6fdd99e2089e5aced155b15861"
},
"downloads": -1,
"filename": "inconnu-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fa5102ea2a7a6fd24819f2d43e252a7f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 20354,
"upload_time": "2025-07-23T10:41:27",
"upload_time_iso_8601": "2025-07-23T10:41:27.910212Z",
"url": "https://files.pythonhosted.org/packages/67/fe/74da14c8b5b13f68c6007d4acb5a6d13511df8a22b0422ff60de68afcd55/inconnu-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "52434b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc",
"md5": "3edc7459d782f6761c5d505037180265",
"sha256": "c081472c756dc7026878fca55d98afef8be3290bb155e97b4c7bc5e039e58d88"
},
"downloads": -1,
"filename": "inconnu-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "3edc7459d782f6761c5d505037180265",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 135787,
"upload_time": "2025-07-23T10:41:29",
"upload_time_iso_8601": "2025-07-23T10:41:29.317254Z",
"url": "https://files.pythonhosted.org/packages/52/43/4b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc/inconnu-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-23 10:41:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "0xjgv",
"github_project": "inconnu#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "inconnu"
}