llm-redact


Namellm-redact JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/lookr-fyi/llm-redact
SummaryPrivacy-first text redaction using local LLM models with rule generation capabilities
upload_time2025-07-21 04:58:17
maintainerNone
docs_urlNone
authorLLM Redact Contributors
requires_python<4.0.0,>=3.12.4
licenseMIT
keywords privacy redaction llm pii data-protection sensitive-data ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLM Redact

Privacy-first text redaction using local LLM models. Automatically detect and redact sensitive information like names, emails, phone numbers, and more.

## Features

- 🔒 **Privacy-first** - Uses local LLM models, no data sent to external services
- 🚀 **Simple API** - One-liner redaction: `llm_redact.mask(text)`
- 💾 **Smart Caching** - SQLite database for caching and history
- 🔧 **Configurable** - Custom rules, models, and database connections
- 📊 **Tracking** - Full history and analytics of redaction operations

## Installation

```bash
pip install llm-redact
```

## Quick Start

```python
import llm_redact

# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from john@example.com")
print(result.redacted_text)
# Output: "Hi, I'm |_NAME_A1B2C3D4_| from |_EMAIL_E5F6G7H8_|"

print(result.replacements)
# Output: [
#   Replacement(original_text="John Doe", replacement_text="|_NAME_A1B2C3D4_|"),
#   Replacement(original_text="john@example.com", replacement_text="|_EMAIL_E5F6G7H8_|")
# ]

# Note: Placeholders contain unique IDs and can be stored in database for restoration
# Each placeholder like |_NAME_A1B2C3D4_| maps to original text via database lookup
```

## Configuration

### Environment Variables

```bash
# LLM Host (default: http://localhost:8000)
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000

# Database (default: sqlite:///llm_redact.db)
export LLM_REDACT_DATABASE_URL=sqlite:///my_redact.db

# Model (default: gemma3:1b)
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b

# Caching (default: True)
export LLM_REDACT_ENABLE_CACHING=true
```

### Custom Database

```python
import llm_redact

# Use PostgreSQL
llm_redact.configure_client(
    database_url="postgresql://user:pass@localhost/redact_db"
)

# Use custom LLM host
llm_redact.configure_client(
    llm_host_url="http://my-llm-server:8000"
)
```

## Advanced Usage

### Custom Rules

```python
from llm_redact import RedactionRule

custom_rules = [
    RedactionRule(
        name="Replace SSN with [SSN]", 
        description="Social Security Numbers",
        data_type="SSN"
    ),
    RedactionRule(
        name="Replace addresses with [ADDRESS]", 
        description="Physical addresses",
        data_type="ADDRESS"
    )
]

result = llm_redact.mask(
    "My SSN is 123-45-6789 and I live at 123 Main St",
    rules=custom_rules
)
```

### Using the Client Directly

```python
from llm_redact import LLMRedactClient

client = LLMRedactClient(
    llm_host_url="http://localhost:8000",
    database_url="sqlite:///custom.db"
)

result = client.mask("Sensitive text here")

# Get history
history = client.get_history(limit=10)

# Create custom rules
rule = client.create_rule(
    name="Replace API keys with [API_KEY]",
    description="API keys and tokens"
)
```

## Prerequisites

1. **LLM Host Server**: Run the llm-redact host server locally:
   ```bash
   # Install and run the LLM host
   ollama serve
   ollama pull gemma3:1b
   
   # Run llm-redact host server
   python -m llm_redact_host
   ```

2. **Database**: SQLite (default) or any SQLAlchemy-supported database

## Supported Redaction Types

- Personal names → `|_NAME_XXXX_|`
- Email addresses → `|_EMAIL_XXXX_|`
- Phone numbers → `|_PHONE_XXXX_|`
- Countries → `|_COUNTRY_XXXX_|`
- Universities → `|_UNIVERSITY_XXXX_|`
- Job titles → `|_JOB_TITLE_XXXX_|`
- Addresses → `|_ADDRESS_XXXX_|`
- Social Security Numbers → `|_SSN_XXXX_|`
- Credit card numbers → `|_CREDIT_CARD_XXXX_|`

Where `XXXX` is a unique 8-character hash ID for each piece of data.

## API Reference

### `llm_redact.mask(text, rules=None, model=None)`

Redact sensitive information from text.

**Parameters:**
- `text` (str): Text to redact
- `rules` (list, optional): Custom redaction rules
- `model` (str, optional): LLM model to use

**Returns:** `RedactionResult` object

### `RedactionResult`

- `original_text`: Original input text
- `redacted_text`: Text with sensitive data redacted
- `replacements`: List of replacements made
- `is_redacted`: Whether any redactions were made
- `processing_time_ms`: Processing time in milliseconds
- `cached`: Whether result was from cache

## License

MIT License 
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lookr-fyi/llm-redact",
    "name": "llm-redact",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.12.4",
    "maintainer_email": null,
    "keywords": "privacy, redaction, llm, pii, data-protection, sensitive-data, ai",
    "author": "LLM Redact Contributors",
    "author_email": "yuqil@lookr.fyi",
    "download_url": "https://files.pythonhosted.org/packages/39/63/c44473eded2d91d75e640c57293ddc8fb821a88f880c3cdf4c8f0d42ac65/llm_redact-0.1.1.tar.gz",
    "platform": null,
    "description": "# LLM Redact\n\nPrivacy-first text redaction using local LLM models. Automatically detect and redact sensitive information like names, emails, phone numbers, and more.\n\n## Features\n\n- \ud83d\udd12 **Privacy-first** - Uses local LLM models, no data sent to external services\n- \ud83d\ude80 **Simple API** - One-liner redaction: `llm_redact.mask(text)`\n- \ud83d\udcbe **Smart Caching** - SQLite database for caching and history\n- \ud83d\udd27 **Configurable** - Custom rules, models, and database connections\n- \ud83d\udcca **Tracking** - Full history and analytics of redaction operations\n\n## Installation\n\n```bash\npip install llm-redact\n```\n\n## Quick Start\n\n```python\nimport llm_redact\n\n# Simple redaction\nresult = llm_redact.mask(\"Hi, I'm John Doe from john@example.com\")\nprint(result.redacted_text)\n# Output: \"Hi, I'm |_NAME_A1B2C3D4_| from |_EMAIL_E5F6G7H8_|\"\n\nprint(result.replacements)\n# Output: [\n#   Replacement(original_text=\"John Doe\", replacement_text=\"|_NAME_A1B2C3D4_|\"),\n#   Replacement(original_text=\"john@example.com\", replacement_text=\"|_EMAIL_E5F6G7H8_|\")\n# ]\n\n# Note: Placeholders contain unique IDs and can be stored in database for restoration\n# Each placeholder like |_NAME_A1B2C3D4_| maps to original text via database lookup\n```\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# LLM Host (default: http://localhost:8000)\nexport LLM_REDACT_LLM_HOST_URL=http://localhost:8000\n\n# Database (default: sqlite:///llm_redact.db)\nexport LLM_REDACT_DATABASE_URL=sqlite:///my_redact.db\n\n# Model (default: gemma3:1b)\nexport LLM_REDACT_DEFAULT_MODEL=gemma3:1b\n\n# Caching (default: True)\nexport LLM_REDACT_ENABLE_CACHING=true\n```\n\n### Custom Database\n\n```python\nimport llm_redact\n\n# Use PostgreSQL\nllm_redact.configure_client(\n    database_url=\"postgresql://user:pass@localhost/redact_db\"\n)\n\n# Use custom LLM host\nllm_redact.configure_client(\n    llm_host_url=\"http://my-llm-server:8000\"\n)\n```\n\n## Advanced Usage\n\n### Custom Rules\n\n```python\nfrom llm_redact import RedactionRule\n\ncustom_rules = [\n    RedactionRule(\n        name=\"Replace SSN with [SSN]\", \n        description=\"Social Security Numbers\",\n        data_type=\"SSN\"\n    ),\n    RedactionRule(\n        name=\"Replace addresses with [ADDRESS]\", \n        description=\"Physical addresses\",\n        data_type=\"ADDRESS\"\n    )\n]\n\nresult = llm_redact.mask(\n    \"My SSN is 123-45-6789 and I live at 123 Main St\",\n    rules=custom_rules\n)\n```\n\n### Using the Client Directly\n\n```python\nfrom llm_redact import LLMRedactClient\n\nclient = LLMRedactClient(\n    llm_host_url=\"http://localhost:8000\",\n    database_url=\"sqlite:///custom.db\"\n)\n\nresult = client.mask(\"Sensitive text here\")\n\n# Get history\nhistory = client.get_history(limit=10)\n\n# Create custom rules\nrule = client.create_rule(\n    name=\"Replace API keys with [API_KEY]\",\n    description=\"API keys and tokens\"\n)\n```\n\n## Prerequisites\n\n1. **LLM Host Server**: Run the llm-redact host server locally:\n   ```bash\n   # Install and run the LLM host\n   ollama serve\n   ollama pull gemma3:1b\n   \n   # Run llm-redact host server\n   python -m llm_redact_host\n   ```\n\n2. **Database**: SQLite (default) or any SQLAlchemy-supported database\n\n## Supported Redaction Types\n\n- Personal names \u2192 `|_NAME_XXXX_|`\n- Email addresses \u2192 `|_EMAIL_XXXX_|`\n- Phone numbers \u2192 `|_PHONE_XXXX_|`\n- Countries \u2192 `|_COUNTRY_XXXX_|`\n- Universities \u2192 `|_UNIVERSITY_XXXX_|`\n- Job titles \u2192 `|_JOB_TITLE_XXXX_|`\n- Addresses \u2192 `|_ADDRESS_XXXX_|`\n- Social Security Numbers \u2192 `|_SSN_XXXX_|`\n- Credit card numbers \u2192 `|_CREDIT_CARD_XXXX_|`\n\nWhere `XXXX` is a unique 8-character hash ID for each piece of data.\n\n## API Reference\n\n### `llm_redact.mask(text, rules=None, model=None)`\n\nRedact sensitive information from text.\n\n**Parameters:**\n- `text` (str): Text to redact\n- `rules` (list, optional): Custom redaction rules\n- `model` (str, optional): LLM model to use\n\n**Returns:** `RedactionResult` object\n\n### `RedactionResult`\n\n- `original_text`: Original input text\n- `redacted_text`: Text with sensitive data redacted\n- `replacements`: List of replacements made\n- `is_redacted`: Whether any redactions were made\n- `processing_time_ms`: Processing time in milliseconds\n- `cached`: Whether result was from cache\n\n## License\n\nMIT License ",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Privacy-first text redaction using local LLM models with rule generation capabilities",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/lookr-fyi/llm-redact/blob/main/README.md",
        "Homepage": "https://github.com/lookr-fyi/llm-redact",
        "Repository": "https://github.com/lookr-fyi/llm-redact"
    },
    "split_keywords": [
        "privacy",
        " redaction",
        " llm",
        " pii",
        " data-protection",
        " sensitive-data",
        " ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b94690776b1b60c99e65eb5db90e8b28fd157d94698b416c1fef60248b496c0b",
                "md5": "2ac6d5f387128241d4b3098aae6710ad",
                "sha256": "0c08d26a8cdeb469c09addfb1731ae1f8bdd8ada4e2e732a7f187b9c8fcc9a6b"
            },
            "downloads": -1,
            "filename": "llm_redact-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2ac6d5f387128241d4b3098aae6710ad",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.12.4",
            "size": 27811,
            "upload_time": "2025-07-21T04:58:16",
            "upload_time_iso_8601": "2025-07-21T04:58:16.732419Z",
            "url": "https://files.pythonhosted.org/packages/b9/46/90776b1b60c99e65eb5db90e8b28fd157d94698b416c1fef60248b496c0b/llm_redact-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3963c44473eded2d91d75e640c57293ddc8fb821a88f880c3cdf4c8f0d42ac65",
                "md5": "ac6c14c4877a4d042539fb9a26eae010",
                "sha256": "61db1eed1179007a0ad25cda31c6b086088d49632b1902ec9d142906570937a6"
            },
            "downloads": -1,
            "filename": "llm_redact-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ac6c14c4877a4d042539fb9a26eae010",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.12.4",
            "size": 18981,
            "upload_time": "2025-07-21T04:58:17",
            "upload_time_iso_8601": "2025-07-21T04:58:17.994850Z",
            "url": "https://files.pythonhosted.org/packages/39/63/c44473eded2d91d75e640c57293ddc8fb821a88f880c3cdf4c8f0d42ac65/llm_redact-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-21 04:58:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lookr-fyi",
    "github_project": "llm-redact",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "llm-redact"
}
        
Elapsed time: 1.30166s