inconnu

Name	inconnu JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	GDPR-compliant data privacy tool for entity redaction and de-anonymization
upload_time	2025-07-23 10:41:29
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	anonymization gdpr nlp pii privacy pseudonymization redaction spacy
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Inconnu

## What is Inconnu?

Inconnu is a GDPR-compliant data privacy tool designed for entity redaction and de-anonymization. It provides cutting-edge NLP-based tools for anonymizing and pseudonymizing text data while maintaining data utility, ensuring your business meets stringent privacy regulations.

## Why Inconnu?

1. **Seamless Compliance**: Inconnu simplifies the complexity of GDPR and other privacy laws, making sure your data handling practices are always in line with legal standards.

2. **State-of-the-Art NLP**: Utilizing advanced spaCy models and custom entity recognition, Inconnu ensures that personal identifiers are completely detected and properly handled.

3. **Transparency and Trust**: Complete processing documentation with timestamping, hashing, and entity mapping for full audit trails.

4. **Reversible Processing**: Support for both anonymization and pseudonymization with complete de-anonymization capabilities.

5. **Performance Optimized**: Fast processing with singleton pattern optimization and configurable text length limits.

## Installation

### Prerequisites

- Python 3.10 or higher
- pip (Python package manager)

### Install from PyPI

```bash
# Basic installation (without language models)
pip install inconnu

# Install with English language support
pip install inconnu[en]

# Install with specific language support
pip install inconnu[de]     # German
pip install inconnu[fr]     # French
pip install inconnu[es]     # Spanish
pip install inconnu[it]     # Italian

# Install with multiple languages
pip install inconnu[en,de,fr]

# Install with all language support
pip install inconnu[all]
```

### Download Language Models

After installation, download the required spaCy models:

```bash
# Using the built-in CLI tool
inconnu-download en            # Download default English model
inconnu-download de fr         # Download German and French models
inconnu-download en --size large  # Download large English model
inconnu-download all           # Download all default models
inconnu-download --list        # List all available models

# Or using spaCy directly
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
```

### Install from Source

1. **Clone the repository**:
   ```bash
   git clone https://github.com/0xjgv/inconnu.git
   cd inconnu
   ```

2. **Install with UV (recommended for development)**:
   ```bash
   make install          # Install dependencies
   make model-de        # Download German model
   make test            # Run tests
   ```

3. **Or install with pip**:
   ```bash
   pip install -e .     # Install in editable mode
   python -m spacy download en_core_web_sm
   ```

### Installing Additional Models

Inconnu supports multiple spaCy models for enhanced accuracy. The default `en_core_web_sm` model is lightweight and fast, but you can install more accurate models:

#### English Models
```bash
# Small model (default) - 15MB, fast processing
uv run python -m spacy download en_core_web_sm

# Large model - 560MB, higher accuracy
uv run python -m spacy download en_core_web_lg

# Transformer model - 438MB, highest accuracy
uv run python -m spacy download en_core_web_trf
```

#### Additional Language Models
```bash
# German model
make model-de
uv run python -m spacy download de_core_news_sm

# Italian model
make model-it
uv run python -m spacy download it_core_news_sm

# Spanish model
make model-es
uv run python -m spacy download es_core_news_sm

# French model
make model-fr
uv run python -m spacy download fr_core_news_sm

# For enhanced accuracy (manual installation)
# Medium German model - better accuracy
uv run python -m spacy download de_core_news_md

# Large German model - highest accuracy
uv run python -m spacy download de_core_news_lg
```

#### Using Different Models

To use a different model, specify it when initializing the EntityRedactor:

```python
from inconnu.nlp.entity_redactor import EntityRedactor, SpacyModels

# Use transformer model for highest accuracy
entity_redactor = EntityRedactor(
    custom_components=None,
    language="en",
    model_name=SpacyModels.EN_CORE_WEB_TRF  # High accuracy transformer model
)
```

**Model Selection Guide:**
- `en_core_web_sm`: Fast processing, good for high-volume processing
- `en_core_web_lg`: Better accuracy, moderate processing time
- `en_core_web_trf`: Highest accuracy, slower processing (recommended for sensitive data)

For more models, visit the [spaCy Models Directory](https://spacy.io/models).

## Development Setup

### Available Commands

```bash
# Development workflow
make install          # Install all dependencies
make model-de         # Download German spaCy model
make model-it         # Download Italian spaCy model
make model-es         # Download Spanish spaCy model
make model-fr         # Download French spaCy model
make test            # Run full test suite
make lint            # Check code with ruff
make format          # Format code with ruff
make fix             # Auto-fix linting issues
make clean           # Format, lint, fix, and clean cache
make update-deps     # Update dependencies
```

### Running Tests

```bash
# Run all tests
make test

# Run with verbose output
uv run pytest -vv

# Run specific test file
uv run pytest tests/test_inconnu.py -vv

# Run specific test class
uv run pytest tests/test_inconnu.py::TestInconnuPseudonymizer -vv
```

## Usage Examples

### Basic Text Anonymization

```python
from inconnu import Inconnu

# Simple initialization - no Config class required!
inconnu = Inconnu()  # Uses sensible defaults

# Simple anonymization - just the redacted text
text = "John Doe from New York visited Paris last summer."
redacted = inconnu.redact(text)
print(redacted)
# Output: "[PERSON] from [GPE] visited [GPE] [DATE]."

# Pseudonymization - get both redacted text and entity mapping
redacted_text, entity_map = inconnu.pseudonymize(text)
print(redacted_text)
# Output: "[PERSON_0] from [GPE_0] visited [GPE_1] [DATE_0]."
print(entity_map)
# Output: {'[PERSON_0]': 'John Doe', '[GPE_0]': 'New York', '[GPE_1]': 'Paris', '[DATE_0]': 'last summer'}

# Advanced usage with full metadata (original API)
result = inconnu(text=text)
print(result.redacted_text)
print(f"Processing time: {result.processing_time_ms:.2f}ms")
```

### Async and Batch Processing

```python
import asyncio

# Async processing for non-blocking operations
async def process_texts():
    inconnu = Inconnu()

    # Single async processing
    text = "John Doe called from +1-555-123-4567"
    redacted = await inconnu.redact_async(text)
    print(redacted)  # "[PERSON] called from [PHONE_NUMBER]"

    # Batch async processing
    texts = [
        "Alice Smith visited Berlin",
        "Bob Jones went to Tokyo",
        "Carol Brown lives in Paris"
    ]
    results = await inconnu.redact_batch_async(texts)
    for result in results:
        print(result)

asyncio.run(process_texts())
```

### Customer Service Email Processing

```python
# Process customer service email with personal data
customer_email = """
Dear SolarTech Team,

I am Max Mustermann living at Hauptstraße 50, 80331 Munich, Germany.
My phone number is +49 89 1234567 and my email is max@example.com.
I need to return my solar modules (Order: ST-78901) due to relocation.

Best regards,
Max Mustermann
"""

# Simple redaction
redacted = inconnu.redact(customer_email)
print(redacted)
# Personal identifiers are automatically detected and redacted
```

### Multi-language Support

```python
# German language processing - simplified!
inconnu_de = Inconnu("de")  # Just specify the language

german_text = "Herr Schmidt aus München besuchte Berlin im März."
redacted = inconnu_de.redact(german_text)
print(redacted)
# Output: "[PERSON] aus [GPE] besuchte [GPE] [DATE]."
```

### Custom Entity Recognition

```python
from inconnu import Inconnu, NERComponent
import re

# Add custom entity recognition
custom_components = [
    NERComponent(
        label="CREDIT_CARD",
        pattern=re.compile(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'),
        processing_function=None
    )
]

# Simple initialization with custom components
inconnu_custom = Inconnu(
    language="en",
    custom_components=custom_components
)

# Test custom entity detection
text = "My card number is 1234 5678 9012 3456"
redacted = inconnu_custom.redact(text)
print(redacted)  # "My card number is [CREDIT_CARD]"
```

### Context Manager for Resource Management

```python
# Automatic resource cleanup
with Inconnu() as inc:
    redacted = inc.redact("Sensitive data about John Doe")
    print(redacted)
# Resources automatically cleaned up
```

### Error Handling

```python
from inconnu import Inconnu, TextTooLongError, ProcessingError

inconnu = Inconnu(max_text_length=100)  # Set small limit for demo

try:
    long_text = "x" * 200  # Exceeds limit
    result = inconnu.redact(long_text)
except TextTooLongError as e:
    print(f"Text too long: {e}")
    # Error includes helpful suggestions for resolution
except ProcessingError as e:
    print(f"Processing failed: {e}")
```

## Use Cases

### 1. **Customer Support Systems**
Automatically redact personal information from customer service emails, chat logs, and support tickets while maintaining context for analysis.

### 2. **Legal Document Processing**
Anonymize legal documents, contracts, and case files for training, analysis, or public release while ensuring GDPR compliance.

### 3. **Medical Record Anonymization**
Process medical records and research data to remove patient identifiers while preserving clinical information for research purposes.

### 4. **Financial Transaction Analysis**
Redact personal financial information from transaction logs and banking communications for fraud analysis and compliance reporting.

### 5. **Survey and Feedback Analysis**
Anonymize customer feedback, survey responses, and user-generated content for analysis while protecting respondent privacy.

### 6. **Training Data Preparation**
Prepare training datasets for machine learning models by removing personal identifiers from text data while maintaining semantic meaning.

## Supported Entity Types

- **Standard Entities**: PERSON, GPE (locations), DATE, ORG, MONEY
- **Custom Entities**: EMAIL, IBAN, PHONE_NUMBER
- **Enhanced Detection**: Person titles (Dr, Mr, Ms), international phone numbers
- **Multilingual**: English, German, Italian, Spanish, and French language support

## Features

- **Robust Entity Detection**: Advanced NLP with spaCy models and custom regex patterns
- **Dual Processing Modes**: Anonymization (`[PERSON]`) and pseudonymization (`[PERSON_0]`)
- **Complete Audit Trail**: Timestamping, hashing, and processing metadata
- **Reversible Processing**: Full de-anonymization capabilities with entity mapping
- **Performance Optimized**: Singleton pattern for model loading, configurable limits
- **GDPR Compliant**: Built-in data retention policies and compliance features

## Contributing

We welcome contributions to Inconnu! As an open source project, we believe in the power of community collaboration to build better privacy tools.

### How to Contribute

#### 1. **Bug Reports & Feature Requests**
- Open an issue on GitHub with detailed descriptions
- Include code examples and expected vs actual behavior
- Tag issues appropriately (bug, enhancement, documentation)

#### 2. **Code Contributions**
```bash
# Fork the repository and create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and ensure tests pass
make test
make lint

# Submit a pull request with:
# - Clear description of changes
# - Test coverage for new features
# - Updated documentation if needed
```

#### 3. **Development Guidelines**
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation for user-facing changes
- Ensure GDPR compliance considerations are addressed

#### 4. **Areas for Contribution**
- **Language Support**: Add new language models and region-specific entity detection
- **Custom Entities**: Implement detection for industry-specific identifiers
- **Performance**: Optimize processing speed and memory usage
- **Documentation**: Improve examples, tutorials, and API documentation
- **Testing**: Expand test coverage and edge case handling

#### 5. **Code Review Process**
- All contributions require code review
- Automated tests must pass
- Documentation updates are appreciated
- Maintain backward compatibility when possible

### Community Guidelines

- **Be Respectful**: Foster an inclusive environment for all contributors
- **Privacy First**: Always consider privacy implications of changes
- **Security Minded**: Report security issues privately before public disclosure
- **Quality Focused**: Prioritize code quality and comprehensive testing

### Getting Help

- **Discussions**: Use GitHub Discussions for questions and ideas
- **Issues**: Report bugs and request features through GitHub Issues
- **Documentation**: Check existing docs and contribute improvements

Thank you for helping make Inconnu a better tool for data privacy and GDPR compliance!

## Publishing to PyPI

### For Maintainers

To publish a new version to PyPI:

1. **Configure Trusted Publisher** (first time only):
   - Go to https://pypi.org/manage/project/inconnu/settings/publishing/
   - Add a new trusted publisher:
     - Publisher: GitHub
     - Organization/username: `0xjgv`
     - Repository name: `inconnu`
     - Workflow name: `publish.yml`
     - Environment name: `pypi` (optional but recommended)
   - For Test PyPI, do the same at https://test.pypi.org with environment name: `testpypi`

2. **Update Version**: Update the version in `pyproject.toml` and `inconnu/__init__.py`

3. **Create a Git Tag**:
   ```bash
   git tag v0.1.0
   git push origin v0.1.0
   ```

4. **GitHub Actions**: The workflow will automatically:
   - Run tests on Python 3.10, 3.11, and 3.12
   - Build the package
   - Publish to PyPI using Trusted Publisher (no API tokens needed!)
   - Generate PEP 740 attestations for security

5. **Test PyPI Publishing**:
   - Use workflow_dispatch to manually trigger Test PyPI publishing
   - Go to Actions → Publish to PyPI → Run workflow

### Manual Publishing (if needed)

```bash
# Build the package
uv build

# Check the package
twine check dist/*

# Upload to Test PyPI (requires API token)
twine upload --repository testpypi dist/*

# Upload to PyPI (requires API token)
twine upload dist/*
```

### GitHub Environments (Recommended)

Configure GitHub environments for additional security:
1. Go to Settings → Environments
2. Create `pypi` and `testpypi` environments
3. Add protection rules:
   - Required reviewers
   - Restrict to specific tags (e.g., `v*`)
   - Add deployment branch restrictions

## Additional Resources

- [spaCy Models Directory](https://spacy.io/models) - Complete list of available language models
- [spaCy Model Releases](https://github.com/explosion/spacy-models) - GitHub repository for model updates
- [pgeocode](https://pypi.org/project/pgeocode/) - Geographic location processing (potential future integration)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "inconnu",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "anonymization, gdpr, nlp, pii, privacy, pseudonymization, redaction, spacy",
    "author": null,
    "author_email": "0xjgv <juans.gaitan@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/52/43/4b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc/inconnu-0.1.0.tar.gz",
    "platform": null,
    "description": "# Inconnu\n\n## What is Inconnu?\n\nInconnu is a GDPR-compliant data privacy tool designed for entity redaction and de-anonymization. It provides cutting-edge NLP-based tools for anonymizing and pseudonymizing text data while maintaining data utility, ensuring your business meets stringent privacy regulations.\n\n## Why Inconnu?\n\n1. **Seamless Compliance**: Inconnu simplifies the complexity of GDPR and other privacy laws, making sure your data handling practices are always in line with legal standards.\n\n2. **State-of-the-Art NLP**: Utilizing advanced spaCy models and custom entity recognition, Inconnu ensures that personal identifiers are completely detected and properly handled.\n\n3. **Transparency and Trust**: Complete processing documentation with timestamping, hashing, and entity mapping for full audit trails.\n\n4. **Reversible Processing**: Support for both anonymization and pseudonymization with complete de-anonymization capabilities.\n\n5. **Performance Optimized**: Fast processing with singleton pattern optimization and configurable text length limits.\n\n## Installation\n\n### Prerequisites\n\n- Python 3.10 or higher\n- pip (Python package manager)\n\n### Install from PyPI\n\n```bash\n# Basic installation (without language models)\npip install inconnu\n\n# Install with English language support\npip install inconnu[en]\n\n# Install with specific language support\npip install inconnu[de]     # German\npip install inconnu[fr]     # French\npip install inconnu[es]     # Spanish\npip install inconnu[it]     # Italian\n\n# Install with multiple languages\npip install inconnu[en,de,fr]\n\n# Install with all language support\npip install inconnu[all]\n```\n\n### Download Language Models\n\nAfter installation, download the required spaCy models:\n\n```bash\n# Using the built-in CLI tool\ninconnu-download en            # Download default English model\ninconnu-download de fr         # Download German and French models\ninconnu-download en --size large  # Download large English model\ninconnu-download all           # Download all default models\ninconnu-download --list        # List all available models\n\n# Or using spaCy directly\npython -m spacy download en_core_web_sm\npython -m spacy download de_core_news_sm\n```\n\n### Install from Source\n\n1. **Clone the repository**:\n   ```bash\n   git clone https://github.com/0xjgv/inconnu.git\n   cd inconnu\n   ```\n\n2. **Install with UV (recommended for development)**:\n   ```bash\n   make install          # Install dependencies\n   make model-de        # Download German model\n   make test            # Run tests\n   ```\n\n3. **Or install with pip**:\n   ```bash\n   pip install -e .     # Install in editable mode\n   python -m spacy download en_core_web_sm\n   ```\n\n### Installing Additional Models\n\nInconnu supports multiple spaCy models for enhanced accuracy. The default `en_core_web_sm` model is lightweight and fast, but you can install more accurate models:\n\n#### English Models\n```bash\n# Small model (default) - 15MB, fast processing\nuv run python -m spacy download en_core_web_sm\n\n# Large model - 560MB, higher accuracy\nuv run python -m spacy download en_core_web_lg\n\n# Transformer model - 438MB, highest accuracy\nuv run python -m spacy download en_core_web_trf\n```\n\n#### Additional Language Models\n```bash\n# German model\nmake model-de\nuv run python -m spacy download de_core_news_sm\n\n# Italian model\nmake model-it\nuv run python -m spacy download it_core_news_sm\n\n# Spanish model\nmake model-es\nuv run python -m spacy download es_core_news_sm\n\n# French model\nmake model-fr\nuv run python -m spacy download fr_core_news_sm\n\n# For enhanced accuracy (manual installation)\n# Medium German model - better accuracy\nuv run python -m spacy download de_core_news_md\n\n# Large German model - highest accuracy\nuv run python -m spacy download de_core_news_lg\n```\n\n#### Using Different Models\n\nTo use a different model, specify it when initializing the EntityRedactor:\n\n```python\nfrom inconnu.nlp.entity_redactor import EntityRedactor, SpacyModels\n\n# Use transformer model for highest accuracy\nentity_redactor = EntityRedactor(\n    custom_components=None,\n    language=\"en\",\n    model_name=SpacyModels.EN_CORE_WEB_TRF  # High accuracy transformer model\n)\n```\n\n**Model Selection Guide:**\n- `en_core_web_sm`: Fast processing, good for high-volume processing\n- `en_core_web_lg`: Better accuracy, moderate processing time\n- `en_core_web_trf`: Highest accuracy, slower processing (recommended for sensitive data)\n\nFor more models, visit the [spaCy Models Directory](https://spacy.io/models).\n\n## Development Setup\n\n### Available Commands\n\n```bash\n# Development workflow\nmake install          # Install all dependencies\nmake model-de         # Download German spaCy model\nmake model-it         # Download Italian spaCy model\nmake model-es         # Download Spanish spaCy model\nmake model-fr         # Download French spaCy model\nmake test            # Run full test suite\nmake lint            # Check code with ruff\nmake format          # Format code with ruff\nmake fix             # Auto-fix linting issues\nmake clean           # Format, lint, fix, and clean cache\nmake update-deps     # Update dependencies\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nmake test\n\n# Run with verbose output\nuv run pytest -vv\n\n# Run specific test file\nuv run pytest tests/test_inconnu.py -vv\n\n# Run specific test class\nuv run pytest tests/test_inconnu.py::TestInconnuPseudonymizer -vv\n```\n\n## Usage Examples\n\n### Basic Text Anonymization\n\n```python\nfrom inconnu import Inconnu\n\n# Simple initialization - no Config class required!\ninconnu = Inconnu()  # Uses sensible defaults\n\n# Simple anonymization - just the redacted text\ntext = \"John Doe from New York visited Paris last summer.\"\nredacted = inconnu.redact(text)\nprint(redacted)\n# Output: \"[PERSON] from [GPE] visited [GPE] [DATE].\"\n\n# Pseudonymization - get both redacted text and entity mapping\nredacted_text, entity_map = inconnu.pseudonymize(text)\nprint(redacted_text)\n# Output: \"[PERSON_0] from [GPE_0] visited [GPE_1] [DATE_0].\"\nprint(entity_map)\n# Output: {'[PERSON_0]': 'John Doe', '[GPE_0]': 'New York', '[GPE_1]': 'Paris', '[DATE_0]': 'last summer'}\n\n# Advanced usage with full metadata (original API)\nresult = inconnu(text=text)\nprint(result.redacted_text)\nprint(f\"Processing time: {result.processing_time_ms:.2f}ms\")\n```\n\n### Async and Batch Processing\n\n```python\nimport asyncio\n\n# Async processing for non-blocking operations\nasync def process_texts():\n    inconnu = Inconnu()\n\n    # Single async processing\n    text = \"John Doe called from +1-555-123-4567\"\n    redacted = await inconnu.redact_async(text)\n    print(redacted)  # \"[PERSON] called from [PHONE_NUMBER]\"\n\n    # Batch async processing\n    texts = [\n        \"Alice Smith visited Berlin\",\n        \"Bob Jones went to Tokyo\",\n        \"Carol Brown lives in Paris\"\n    ]\n    results = await inconnu.redact_batch_async(texts)\n    for result in results:\n        print(result)\n\nasyncio.run(process_texts())\n```\n\n### Customer Service Email Processing\n\n```python\n# Process customer service email with personal data\ncustomer_email = \"\"\"\nDear SolarTech Team,\n\nI am Max Mustermann living at Hauptstra\u00dfe 50, 80331 Munich, Germany.\nMy phone number is +49 89 1234567 and my email is max@example.com.\nI need to return my solar modules (Order: ST-78901) due to relocation.\n\nBest regards,\nMax Mustermann\n\"\"\"\n\n# Simple redaction\nredacted = inconnu.redact(customer_email)\nprint(redacted)\n# Personal identifiers are automatically detected and redacted\n```\n\n### Multi-language Support\n\n```python\n# German language processing - simplified!\ninconnu_de = Inconnu(\"de\")  # Just specify the language\n\ngerman_text = \"Herr Schmidt aus M\u00fcnchen besuchte Berlin im M\u00e4rz.\"\nredacted = inconnu_de.redact(german_text)\nprint(redacted)\n# Output: \"[PERSON] aus [GPE] besuchte [GPE] [DATE].\"\n```\n\n### Custom Entity Recognition\n\n```python\nfrom inconnu import Inconnu, NERComponent\nimport re\n\n# Add custom entity recognition\ncustom_components = [\n    NERComponent(\n        label=\"CREDIT_CARD\",\n        pattern=re.compile(r'\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b'),\n        processing_function=None\n    )\n]\n\n# Simple initialization with custom components\ninconnu_custom = Inconnu(\n    language=\"en\",\n    custom_components=custom_components\n)\n\n# Test custom entity detection\ntext = \"My card number is 1234 5678 9012 3456\"\nredacted = inconnu_custom.redact(text)\nprint(redacted)  # \"My card number is [CREDIT_CARD]\"\n```\n\n### Context Manager for Resource Management\n\n```python\n# Automatic resource cleanup\nwith Inconnu() as inc:\n    redacted = inc.redact(\"Sensitive data about John Doe\")\n    print(redacted)\n# Resources automatically cleaned up\n```\n\n### Error Handling\n\n```python\nfrom inconnu import Inconnu, TextTooLongError, ProcessingError\n\ninconnu = Inconnu(max_text_length=100)  # Set small limit for demo\n\ntry:\n    long_text = \"x\" * 200  # Exceeds limit\n    result = inconnu.redact(long_text)\nexcept TextTooLongError as e:\n    print(f\"Text too long: {e}\")\n    # Error includes helpful suggestions for resolution\nexcept ProcessingError as e:\n    print(f\"Processing failed: {e}\")\n```\n\n## Use Cases\n\n### 1. **Customer Support Systems**\nAutomatically redact personal information from customer service emails, chat logs, and support tickets while maintaining context for analysis.\n\n### 2. **Legal Document Processing**\nAnonymize legal documents, contracts, and case files for training, analysis, or public release while ensuring GDPR compliance.\n\n### 3. **Medical Record Anonymization**\nProcess medical records and research data to remove patient identifiers while preserving clinical information for research purposes.\n\n### 4. **Financial Transaction Analysis**\nRedact personal financial information from transaction logs and banking communications for fraud analysis and compliance reporting.\n\n### 5. **Survey and Feedback Analysis**\nAnonymize customer feedback, survey responses, and user-generated content for analysis while protecting respondent privacy.\n\n### 6. **Training Data Preparation**\nPrepare training datasets for machine learning models by removing personal identifiers from text data while maintaining semantic meaning.\n\n## Supported Entity Types\n\n- **Standard Entities**: PERSON, GPE (locations), DATE, ORG, MONEY\n- **Custom Entities**: EMAIL, IBAN, PHONE_NUMBER\n- **Enhanced Detection**: Person titles (Dr, Mr, Ms), international phone numbers\n- **Multilingual**: English, German, Italian, Spanish, and French language support\n\n## Features\n\n- **Robust Entity Detection**: Advanced NLP with spaCy models and custom regex patterns\n- **Dual Processing Modes**: Anonymization (`[PERSON]`) and pseudonymization (`[PERSON_0]`)\n- **Complete Audit Trail**: Timestamping, hashing, and processing metadata\n- **Reversible Processing**: Full de-anonymization capabilities with entity mapping\n- **Performance Optimized**: Singleton pattern for model loading, configurable limits\n- **GDPR Compliant**: Built-in data retention policies and compliance features\n\n## Contributing\n\nWe welcome contributions to Inconnu! As an open source project, we believe in the power of community collaboration to build better privacy tools.\n\n### How to Contribute\n\n#### 1. **Bug Reports & Feature Requests**\n- Open an issue on GitHub with detailed descriptions\n- Include code examples and expected vs actual behavior\n- Tag issues appropriately (bug, enhancement, documentation)\n\n#### 2. **Code Contributions**\n```bash\n# Fork the repository and create a feature branch\ngit checkout -b feature/your-feature-name\n\n# Make your changes and ensure tests pass\nmake test\nmake lint\n\n# Submit a pull request with:\n# - Clear description of changes\n# - Test coverage for new features\n# - Updated documentation if needed\n```\n\n#### 3. **Development Guidelines**\n- Follow existing code style and patterns\n- Add tests for new functionality\n- Update documentation for user-facing changes\n- Ensure GDPR compliance considerations are addressed\n\n#### 4. **Areas for Contribution**\n- **Language Support**: Add new language models and region-specific entity detection\n- **Custom Entities**: Implement detection for industry-specific identifiers\n- **Performance**: Optimize processing speed and memory usage\n- **Documentation**: Improve examples, tutorials, and API documentation\n- **Testing**: Expand test coverage and edge case handling\n\n#### 5. **Code Review Process**\n- All contributions require code review\n- Automated tests must pass\n- Documentation updates are appreciated\n- Maintain backward compatibility when possible\n\n### Community Guidelines\n\n- **Be Respectful**: Foster an inclusive environment for all contributors\n- **Privacy First**: Always consider privacy implications of changes\n- **Security Minded**: Report security issues privately before public disclosure\n- **Quality Focused**: Prioritize code quality and comprehensive testing\n\n### Getting Help\n\n- **Discussions**: Use GitHub Discussions for questions and ideas\n- **Issues**: Report bugs and request features through GitHub Issues\n- **Documentation**: Check existing docs and contribute improvements\n\nThank you for helping make Inconnu a better tool for data privacy and GDPR compliance!\n\n## Publishing to PyPI\n\n### For Maintainers\n\nTo publish a new version to PyPI:\n\n1. **Configure Trusted Publisher** (first time only):\n   - Go to https://pypi.org/manage/project/inconnu/settings/publishing/\n   - Add a new trusted publisher:\n     - Publisher: GitHub\n     - Organization/username: `0xjgv`\n     - Repository name: `inconnu`\n     - Workflow name: `publish.yml`\n     - Environment name: `pypi` (optional but recommended)\n   - For Test PyPI, do the same at https://test.pypi.org with environment name: `testpypi`\n\n2. **Update Version**: Update the version in `pyproject.toml` and `inconnu/__init__.py`\n\n3. **Create a Git Tag**:\n   ```bash\n   git tag v0.1.0\n   git push origin v0.1.0\n   ```\n\n4. **GitHub Actions**: The workflow will automatically:\n   - Run tests on Python 3.10, 3.11, and 3.12\n   - Build the package\n   - Publish to PyPI using Trusted Publisher (no API tokens needed!)\n   - Generate PEP 740 attestations for security\n\n5. **Test PyPI Publishing**:\n   - Use workflow_dispatch to manually trigger Test PyPI publishing\n   - Go to Actions \u2192 Publish to PyPI \u2192 Run workflow\n\n### Manual Publishing (if needed)\n\n```bash\n# Build the package\nuv build\n\n# Check the package\ntwine check dist/*\n\n# Upload to Test PyPI (requires API token)\ntwine upload --repository testpypi dist/*\n\n# Upload to PyPI (requires API token)\ntwine upload dist/*\n```\n\n### GitHub Environments (Recommended)\n\nConfigure GitHub environments for additional security:\n1. Go to Settings \u2192 Environments\n2. Create `pypi` and `testpypi` environments\n3. Add protection rules:\n   - Required reviewers\n   - Restrict to specific tags (e.g., `v*`)\n   - Add deployment branch restrictions\n\n## Additional Resources\n\n- [spaCy Models Directory](https://spacy.io/models) - Complete list of available language models\n- [spaCy Model Releases](https://github.com/explosion/spacy-models) - GitHub repository for model updates\n- [pgeocode](https://pypi.org/project/pgeocode/) - Geographic location processing (potential future integration)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "GDPR-compliant data privacy tool for entity redaction and de-anonymization",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/0xjgv/inconnu#readme",
        "Homepage": "https://github.com/0xjgv/inconnu",
        "Issues": "https://github.com/0xjgv/inconnu/issues",
        "Repository": "https://github.com/0xjgv/inconnu"
    },
    "split_keywords": [
        "anonymization",
        " gdpr",
        " nlp",
        " pii",
        " privacy",
        " pseudonymization",
        " redaction",
        " spacy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "67fe74da14c8b5b13f68c6007d4acb5a6d13511df8a22b0422ff60de68afcd55",
                "md5": "fa5102ea2a7a6fd24819f2d43e252a7f",
                "sha256": "ab85645c07ec794cd47522230357f660ce6bbf6fdd99e2089e5aced155b15861"
            },
            "downloads": -1,
            "filename": "inconnu-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa5102ea2a7a6fd24819f2d43e252a7f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 20354,
            "upload_time": "2025-07-23T10:41:27",
            "upload_time_iso_8601": "2025-07-23T10:41:27.910212Z",
            "url": "https://files.pythonhosted.org/packages/67/fe/74da14c8b5b13f68c6007d4acb5a6d13511df8a22b0422ff60de68afcd55/inconnu-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "52434b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc",
                "md5": "3edc7459d782f6761c5d505037180265",
                "sha256": "c081472c756dc7026878fca55d98afef8be3290bb155e97b4c7bc5e039e58d88"
            },
            "downloads": -1,
            "filename": "inconnu-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3edc7459d782f6761c5d505037180265",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 135787,
            "upload_time": "2025-07-23T10:41:29",
            "upload_time_iso_8601": "2025-07-23T10:41:29.317254Z",
            "url": "https://files.pythonhosted.org/packages/52/43/4b3d4ec0d7ee304c15eef046100245e4af3a767d2171ba50ce85252c15cc/inconnu-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-23 10:41:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "0xjgv",
    "github_project": "inconnu#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "inconnu"
}

None