redactyl


Nameredactyl JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryFast, deterministic PII redaction using type-preserving tokens ([NAME_1], [EMAIL_2]) with perfect reversibility
upload_time2025-08-14 11:39:41
maintainerNone
docs_urlNone
authorscreensailor
requires_python<3.14,>=3.12
licenseMIT
keywords ai anonymization data-protection gdpr gliner llm pii presidio privacy redaction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Redactyl

Type-safe, reversible PII protection for LLM apps — now centered around the `@pii.protect` decorator.

Redactyl replaces sensitive values with stable tokens (for example: `[EMAIL_1]`, `[NAME_FIRST_1]`) before your code talks to an LLM, and restores originals when results come back. It works across Pydantic models, nested containers, sync/async functions, and streams — automatically.

**PII Membrane (“protect” bubble)**
- Inside decorated functions: arguments are redacted; your code and LLMs see tokens.
- Outside: return values and stream yields are unredacted; callers see originals.
- Two-way membrane: redacts on entry, unredacts on exit.
- Mapping source of truth: only incoming arguments build the redaction map; outputs are unredacted using that map.

Warning: You need a spaCy model installed (see Installation). Optional GLiNER improves name component detection.

## Why Redactyl?

- ✅ Zero trust to LLMs: Never expose real PII
- ✅ Type-safe: Full Pydantic integration and container traversal
- ✅ Reversible: Get original data back every time
- ✅ Streaming-ready: Works with sync/async generators
- ✅ Intelligent: Understands name relationships and components

### Quickstart (Plain String)

Use `@pii.protect` on ordinary functions too — no Pydantic required. Decorated functions are transparent to callers: from the outside you can’t tell PII protection is happening.

```python
from redactyl.pydantic_integration import PIIConfig

pii = PIIConfig()

@pii.protect
def summarize(text: str) -> str:
    # Inside: text is redacted (e.g., emails → [EMAIL_1])
    return f"Processed: {text}"

print(summarize("Email me at john@example.com"))
# → "Processed: Email me at john@example.com" (unredacted on return)
```

## The `@pii.protect` Moment

Drop a decorator and keep coding. Redactyl figures it out.

```python
from typing import Annotated
from pydantic import BaseModel
from redactyl.pydantic_integration import PIIConfig, pii
from redactyl.types import PIIType

# Zero-config to start; tuned via PIIConfig kwargs
pii = PIIConfig()

class Email(BaseModel):
    sender_name: Annotated[str, pii(PIIType.PERSON, parse_components=True)]
    sender_email: Annotated[str, pii(PIIType.EMAIL)]
    subject: str  # auto-detected
    body: str     # auto-detected

@pii.protect
async def draft_reply(email: Email) -> str:
    # Inside: email fields are redacted, e.g.
    #   "John Smith <john@example.com>" → "[NAME_FIRST_1] [NAME_LAST_1] <[EMAIL_1]>"
    # Call your LLM as usual — it will see tokens
    reply = await llm.generate({
        "subject": f"Re: {email.subject}",
        "body": f"Hi {email.sender_name}, …"
    })
    # Return values are automatically unredacted
    return reply

# What you get back has real PII restored
text = await draft_reply(Email(
    sender_name="John Smith",
    sender_email="john@example.com",
    subject="Project X",
    body="Ping me tomorrow"
))
print(text)  # → "Hi John, … I'll email john@example.com"
```

Why this feels essential:
- Minimal change: add a decorator; keep your LLM calls.
- Smart defaults: auto-detects sync/async/generator functions and Pydantic arguments.
- Transparent: callers see originals; tokens exist only inside the bubble.
- Reversible: tokens round-trip perfectly; originals are restored for outputs.

Name intelligence:
- Full names become the source of truth. Later mentions like "John", "Mr. Appleseed", or just "Appleseed" reuse the same token index.
- Example: "John Appleseed … Appleseed" → "[NAME_FIRST_1] [NAME_LAST_1] … [NAME_LAST_1]".

## Progressive Examples

### 1) Basics: Functions and Models

```python
from pydantic import BaseModel
from typing import Annotated
from redactyl.pydantic_integration import PIIConfig, pii
from redactyl.types import PIIType

pii = PIIConfig()

class Message(BaseModel):
    user: Annotated[str, pii(PIIType.PERSON, parse_components=True)]
    email: Annotated[str, pii(PIIType.EMAIL)]
    text: str  # auto-detected

@pii.protect
def handle(msg: Message) -> str:
    # msg is redacted here: "Jane <jane@x.com>" → tokens
    return llm.call(msg.model_dump())  # LLM works with tokens

result = handle(Message(user="Jane Roe", email="jane@x.com", text="Hello"))
print(result)  # Unredacted output
```

### 2) Streaming: Transparent Membrane

Decorated generators work like a two-way membrane: inputs are redacted on entry and every yielded item is unredacted on exit. Consumers of the stream never see tokens — only original values.

```python
from collections.abc import AsyncIterator
from pydantic import BaseModel

class In(BaseModel):
    content: str

class Out(BaseModel):
    content: str

# Optional: observe input-derived state after a stream completes (for persistence/debugging)
captured_state = None
pii = PIIConfig(on_stream_complete=lambda st: globals().__setitem__("captured_state", st))

@pii.protect
async def chat_stream(message: In) -> AsyncIterator[Out]:
    # Inside: message.content is redacted (e.g., john@example.com → [EMAIL_1])
    async for chunk in llm.stream(message.content):
        # The LLM sees tokens and may emit them in its text
        # On exit, the decorator unredacts using the input-based map
        yield Out(content=chunk)

# Consumers get unredacted values; tokens never leak outside the bubble
async for part in chat_stream(In(content="Email me at john@example.com")):
    print(part.content)  # e.g., "Thanks, I’ll email john@example.com"
```

Notes:
- Works with async and sync generators alike.
- `on_stream_complete(state)` exposes the final input-based `RedactionState` for persistence or auditing; it isn’t needed to consume the stream.

### Streaming State Tracking

- Source of truth: only function arguments build the redaction map.
- Unredaction on exit: every yielded or returned value is unredacted using that map.
- Persistence hook: capture the final input-derived `RedactionState` with `on_stream_complete` if you need to store state for later unredaction.

### 3) Containers: Lists, Dicts, Sets, Tuples, Frozensets

No special casing required — Redactyl traverses common containers in both inputs and return values.

```python
from typing import Any
from pydantic import BaseModel

class Profile(BaseModel):
    name: str
    email: str

pii = PIIConfig()  # traverse_containers=True by default

@pii.protect
def analyze(batch: list[Profile] | dict[str, Any] | set[str]) -> dict[str, Any]:
    # All nested strings/models are protected here
    # You can safely pass "batch" to your LLM/tooling
    summary = llm.summarize(batch)
    # Return values (including containers) are unredacted on the way out
    return {"summary": summary}

out = analyze([
    Profile(name="Ada Lovelace", email="ada@example.com"),
    Profile(name="Alan Turing", email="alan@example.com"),
])
print(out["summary"])  # contains real names/emails again
```

## Install

```bash
pip install redactyl

# Optional: better name component detection
pip install "redactyl[gliner]"

# Required spaCy model
python -m spacy download en_core_web_sm
```

## Why Tokens (Not Fake Data)?

- LLMs preserve structured placeholders like `[EMAIL_1]` exactly.
- We track name components intelligently so short mentions like "John" map back to the same person as "John Smith".
- Every token is perfectly reversible — outputs come back with originals.

## Pydantic-Friendly API Surface

- `@pii.protect`: Auto-protects Pydantic `BaseModel` args, traverses containers, and unprotects returns and yields (membrane behavior).
- Function modes: Detects sync, async, generator, and async-generator transparently.
- `pii(...)`: Annotate fields for explicit types or to disable detection per-field.
- Callbacks: `on_detection`, `on_hallucination`, `on_gliner_unavailable`, `on_batch_error`, `on_unredaction_issue`, `on_gliner_model_error`.
- Streaming: yields are unredacted to callers; `on_stream_complete(state)` exposes the final `RedactionState` for persistence.

## Keep Tokens with `pii(unredact=False)`

Sometimes you need redacted tokens to remain redacted in outputs (e.g., audit logs, downstream pipelines, or compliance scenarios). You can mark output fields to never unredact with `pii(unredact=False)`. Unredaction is treated as a subtree toggle: nothing within that field is unredacted.

Example:

```python
from typing import Annotated
from pydantic import BaseModel
from redactyl.pydantic_integration import PIIConfig, pii
from redactyl.types import PIIType

class Input(BaseModel):
    name: str
    email: str

class AuditLog(BaseModel):
    expose_email: Annotated[str, pii(PIIType.EMAIL)]       # unredacts by default
    audit_email: Annotated[str, pii(unredact=False)]       # stays as [EMAIL_1]
    message: str

config = PIIConfig()  # use default detectors

@config.protect
def create_log(inp: Input) -> AuditLog:
    # Inside: inp.name and inp.email are redacted tokens
    return AuditLog(
        expose_email=inp.email,
        audit_email=inp.email,          # remains token on exit
        message=f"Processed {inp.name}" # unredacts on exit
    )

out = create_log(Input(name="John Doe", email="john@example.com"))
assert out.expose_email == "john@example.com"
assert out.audit_email.startswith("[EMAIL_")   # stays redacted
assert "John Doe" in out.message
```

Notes:
- Default behavior is `unredact=True`, preserving backward compatibility.
- `unredact=False` applies to the entire field subtree (nested models/containers).
- Precedence: `detect=False` means the field is skipped entirely during both protection and unprotection.

## Known Limitations

- **Text Length**: The underlying spaCy models have a maximum text length of 1 million characters. Texts exceeding this limit will raise an error. For longer documents, consider processing them in chunks.

## v0.2.0 Highlights

- Containers: Deep traversal for `list`, `dict`, `set`, `tuple`, and `frozenset` (both inputs and returns).
- Streaming membrane: generators now unredact yields; callers see original PII.
- Streaming persistence: `on_stream_complete` surfaces the final `RedactionState` after generator completion.
- Name components: Full-name phrases establish the source of truth; partials reuse the same index.
- Smarter decorator: Auto-detects async/sync, generator/async-generator; protects models; unprotects returns.
- Quality: 100% test pass rate (206/206 tests).

## Configuration Cheatsheet

```python
from redactyl.pydantic_integration import HallucinationResponse

pii = PIIConfig(
    batch_detection=True,        # speed + consistent numbering across fields
    use_name_parsing=True,       # parse title/first/middle/last when available
    fuzzy_unredaction=False,     # allow fuzzy matches on restore
    traverse_containers=True,    # enable container traversal
    on_detection=lambda es: log.info("%d entities", len(es)),
    on_hallucination=lambda issues: [
        # replace hallucinated emails; preserve others
        HallucinationResponse.replace("[REDACTED]") if "EMAIL" in i.token else HallucinationResponse.preserve()
        for i in issues
    ],
)
```

### Common customizations

```python
# Custom detector
pii = PIIConfig(detector=MyCustomDetector())

# Batch processing for consistency and speed
pii = PIIConfig(batch_detection=True)

# Handle hallucinations
from redactyl.pydantic_integration import HallucinationResponse
def handle_llm_mistakes(issues):
    return [
        HallucinationResponse.replace("[REDACTED]") if "EMAIL" in i.token else HallucinationResponse.preserve()
        for i in issues
    ]
pii = PIIConfig(on_hallucination=handle_llm_mistakes)
```

Field-level control with `pii`:

```python
class User(BaseModel):
    # force detection as a PERSON and parse components
    name: Annotated[str, pii(PIIType.PERSON, parse_components=True)]
    # mark as email explicitly
    email: Annotated[str, pii(PIIType.EMAIL)]
    # or disable detection for a field
    notes: Annotated[str, pii(detect=False)]
```

## Development

```bash
uv python pin 3.12
uv pip install -e .[dev]
uv run python -m spacy download en_core_web_sm

uv run ruff check --fix && uv run ruff format
uv run pyright src/
uv run pytest -q
```

## License

MIT — see LICENSE.

See CHANGELOG.md for release notes.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "redactyl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.12",
    "maintainer_email": null,
    "keywords": "ai, anonymization, data-protection, gdpr, gliner, llm, pii, presidio, privacy, redaction",
    "author": "screensailor",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9d/0c/081b336367a7767ad5c228c385f91333a2372bae403eca0f3f10cbdc1958/redactyl-0.4.0.tar.gz",
    "platform": null,
    "description": "# Redactyl\n\nType-safe, reversible PII protection for LLM apps \u2014 now centered around the `@pii.protect` decorator.\n\nRedactyl replaces sensitive values with stable tokens (for example: `[EMAIL_1]`, `[NAME_FIRST_1]`) before your code talks to an LLM, and restores originals when results come back. It works across Pydantic models, nested containers, sync/async functions, and streams \u2014 automatically.\n\n**PII Membrane (\u201cprotect\u201d bubble)**\n- Inside decorated functions: arguments are redacted; your code and LLMs see tokens.\n- Outside: return values and stream yields are unredacted; callers see originals.\n- Two-way membrane: redacts on entry, unredacts on exit.\n- Mapping source of truth: only incoming arguments build the redaction map; outputs are unredacted using that map.\n\nWarning: You need a spaCy model installed (see Installation). Optional GLiNER improves name component detection.\n\n## Why Redactyl?\n\n- \u2705 Zero trust to LLMs: Never expose real PII\n- \u2705 Type-safe: Full Pydantic integration and container traversal\n- \u2705 Reversible: Get original data back every time\n- \u2705 Streaming-ready: Works with sync/async generators\n- \u2705 Intelligent: Understands name relationships and components\n\n### Quickstart (Plain String)\n\nUse `@pii.protect` on ordinary functions too \u2014 no Pydantic required. Decorated functions are transparent to callers: from the outside you can\u2019t tell PII protection is happening.\n\n```python\nfrom redactyl.pydantic_integration import PIIConfig\n\npii = PIIConfig()\n\n@pii.protect\ndef summarize(text: str) -> str:\n    # Inside: text is redacted (e.g., emails \u2192 [EMAIL_1])\n    return f\"Processed: {text}\"\n\nprint(summarize(\"Email me at john@example.com\"))\n# \u2192 \"Processed: Email me at john@example.com\" (unredacted on return)\n```\n\n## The `@pii.protect` Moment\n\nDrop a decorator and keep coding. Redactyl figures it out.\n\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom redactyl.pydantic_integration import PIIConfig, pii\nfrom redactyl.types import PIIType\n\n# Zero-config to start; tuned via PIIConfig kwargs\npii = PIIConfig()\n\nclass Email(BaseModel):\n    sender_name: Annotated[str, pii(PIIType.PERSON, parse_components=True)]\n    sender_email: Annotated[str, pii(PIIType.EMAIL)]\n    subject: str  # auto-detected\n    body: str     # auto-detected\n\n@pii.protect\nasync def draft_reply(email: Email) -> str:\n    # Inside: email fields are redacted, e.g.\n    #   \"John Smith <john@example.com>\" \u2192 \"[NAME_FIRST_1] [NAME_LAST_1] <[EMAIL_1]>\"\n    # Call your LLM as usual \u2014 it will see tokens\n    reply = await llm.generate({\n        \"subject\": f\"Re: {email.subject}\",\n        \"body\": f\"Hi {email.sender_name}, \u2026\"\n    })\n    # Return values are automatically unredacted\n    return reply\n\n# What you get back has real PII restored\ntext = await draft_reply(Email(\n    sender_name=\"John Smith\",\n    sender_email=\"john@example.com\",\n    subject=\"Project X\",\n    body=\"Ping me tomorrow\"\n))\nprint(text)  # \u2192 \"Hi John, \u2026 I'll email john@example.com\"\n```\n\nWhy this feels essential:\n- Minimal change: add a decorator; keep your LLM calls.\n- Smart defaults: auto-detects sync/async/generator functions and Pydantic arguments.\n- Transparent: callers see originals; tokens exist only inside the bubble.\n- Reversible: tokens round-trip perfectly; originals are restored for outputs.\n\nName intelligence:\n- Full names become the source of truth. Later mentions like \"John\", \"Mr. Appleseed\", or just \"Appleseed\" reuse the same token index.\n- Example: \"John Appleseed \u2026 Appleseed\" \u2192 \"[NAME_FIRST_1] [NAME_LAST_1] \u2026 [NAME_LAST_1]\".\n\n## Progressive Examples\n\n### 1) Basics: Functions and Models\n\n```python\nfrom pydantic import BaseModel\nfrom typing import Annotated\nfrom redactyl.pydantic_integration import PIIConfig, pii\nfrom redactyl.types import PIIType\n\npii = PIIConfig()\n\nclass Message(BaseModel):\n    user: Annotated[str, pii(PIIType.PERSON, parse_components=True)]\n    email: Annotated[str, pii(PIIType.EMAIL)]\n    text: str  # auto-detected\n\n@pii.protect\ndef handle(msg: Message) -> str:\n    # msg is redacted here: \"Jane <jane@x.com>\" \u2192 tokens\n    return llm.call(msg.model_dump())  # LLM works with tokens\n\nresult = handle(Message(user=\"Jane Roe\", email=\"jane@x.com\", text=\"Hello\"))\nprint(result)  # Unredacted output\n```\n\n### 2) Streaming: Transparent Membrane\n\nDecorated generators work like a two-way membrane: inputs are redacted on entry and every yielded item is unredacted on exit. Consumers of the stream never see tokens \u2014 only original values.\n\n```python\nfrom collections.abc import AsyncIterator\nfrom pydantic import BaseModel\n\nclass In(BaseModel):\n    content: str\n\nclass Out(BaseModel):\n    content: str\n\n# Optional: observe input-derived state after a stream completes (for persistence/debugging)\ncaptured_state = None\npii = PIIConfig(on_stream_complete=lambda st: globals().__setitem__(\"captured_state\", st))\n\n@pii.protect\nasync def chat_stream(message: In) -> AsyncIterator[Out]:\n    # Inside: message.content is redacted (e.g., john@example.com \u2192 [EMAIL_1])\n    async for chunk in llm.stream(message.content):\n        # The LLM sees tokens and may emit them in its text\n        # On exit, the decorator unredacts using the input-based map\n        yield Out(content=chunk)\n\n# Consumers get unredacted values; tokens never leak outside the bubble\nasync for part in chat_stream(In(content=\"Email me at john@example.com\")):\n    print(part.content)  # e.g., \"Thanks, I\u2019ll email john@example.com\"\n```\n\nNotes:\n- Works with async and sync generators alike.\n- `on_stream_complete(state)` exposes the final input-based `RedactionState` for persistence or auditing; it isn\u2019t needed to consume the stream.\n\n### Streaming State Tracking\n\n- Source of truth: only function arguments build the redaction map.\n- Unredaction on exit: every yielded or returned value is unredacted using that map.\n- Persistence hook: capture the final input-derived `RedactionState` with `on_stream_complete` if you need to store state for later unredaction.\n\n### 3) Containers: Lists, Dicts, Sets, Tuples, Frozensets\n\nNo special casing required \u2014 Redactyl traverses common containers in both inputs and return values.\n\n```python\nfrom typing import Any\nfrom pydantic import BaseModel\n\nclass Profile(BaseModel):\n    name: str\n    email: str\n\npii = PIIConfig()  # traverse_containers=True by default\n\n@pii.protect\ndef analyze(batch: list[Profile] | dict[str, Any] | set[str]) -> dict[str, Any]:\n    # All nested strings/models are protected here\n    # You can safely pass \"batch\" to your LLM/tooling\n    summary = llm.summarize(batch)\n    # Return values (including containers) are unredacted on the way out\n    return {\"summary\": summary}\n\nout = analyze([\n    Profile(name=\"Ada Lovelace\", email=\"ada@example.com\"),\n    Profile(name=\"Alan Turing\", email=\"alan@example.com\"),\n])\nprint(out[\"summary\"])  # contains real names/emails again\n```\n\n## Install\n\n```bash\npip install redactyl\n\n# Optional: better name component detection\npip install \"redactyl[gliner]\"\n\n# Required spaCy model\npython -m spacy download en_core_web_sm\n```\n\n## Why Tokens (Not Fake Data)?\n\n- LLMs preserve structured placeholders like `[EMAIL_1]` exactly.\n- We track name components intelligently so short mentions like \"John\" map back to the same person as \"John Smith\".\n- Every token is perfectly reversible \u2014 outputs come back with originals.\n\n## Pydantic-Friendly API Surface\n\n- `@pii.protect`: Auto-protects Pydantic `BaseModel` args, traverses containers, and unprotects returns and yields (membrane behavior).\n- Function modes: Detects sync, async, generator, and async-generator transparently.\n- `pii(...)`: Annotate fields for explicit types or to disable detection per-field.\n- Callbacks: `on_detection`, `on_hallucination`, `on_gliner_unavailable`, `on_batch_error`, `on_unredaction_issue`, `on_gliner_model_error`.\n- Streaming: yields are unredacted to callers; `on_stream_complete(state)` exposes the final `RedactionState` for persistence.\n\n## Keep Tokens with `pii(unredact=False)`\n\nSometimes you need redacted tokens to remain redacted in outputs (e.g., audit logs, downstream pipelines, or compliance scenarios). You can mark output fields to never unredact with `pii(unredact=False)`. Unredaction is treated as a subtree toggle: nothing within that field is unredacted.\n\nExample:\n\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom redactyl.pydantic_integration import PIIConfig, pii\nfrom redactyl.types import PIIType\n\nclass Input(BaseModel):\n    name: str\n    email: str\n\nclass AuditLog(BaseModel):\n    expose_email: Annotated[str, pii(PIIType.EMAIL)]       # unredacts by default\n    audit_email: Annotated[str, pii(unredact=False)]       # stays as [EMAIL_1]\n    message: str\n\nconfig = PIIConfig()  # use default detectors\n\n@config.protect\ndef create_log(inp: Input) -> AuditLog:\n    # Inside: inp.name and inp.email are redacted tokens\n    return AuditLog(\n        expose_email=inp.email,\n        audit_email=inp.email,          # remains token on exit\n        message=f\"Processed {inp.name}\" # unredacts on exit\n    )\n\nout = create_log(Input(name=\"John Doe\", email=\"john@example.com\"))\nassert out.expose_email == \"john@example.com\"\nassert out.audit_email.startswith(\"[EMAIL_\")   # stays redacted\nassert \"John Doe\" in out.message\n```\n\nNotes:\n- Default behavior is `unredact=True`, preserving backward compatibility.\n- `unredact=False` applies to the entire field subtree (nested models/containers).\n- Precedence: `detect=False` means the field is skipped entirely during both protection and unprotection.\n\n## Known Limitations\n\n- **Text Length**: The underlying spaCy models have a maximum text length of 1 million characters. Texts exceeding this limit will raise an error. For longer documents, consider processing them in chunks.\n\n## v0.2.0 Highlights\n\n- Containers: Deep traversal for `list`, `dict`, `set`, `tuple`, and `frozenset` (both inputs and returns).\n- Streaming membrane: generators now unredact yields; callers see original PII.\n- Streaming persistence: `on_stream_complete` surfaces the final `RedactionState` after generator completion.\n- Name components: Full-name phrases establish the source of truth; partials reuse the same index.\n- Smarter decorator: Auto-detects async/sync, generator/async-generator; protects models; unprotects returns.\n- Quality: 100% test pass rate (206/206 tests).\n\n## Configuration Cheatsheet\n\n```python\nfrom redactyl.pydantic_integration import HallucinationResponse\n\npii = PIIConfig(\n    batch_detection=True,        # speed + consistent numbering across fields\n    use_name_parsing=True,       # parse title/first/middle/last when available\n    fuzzy_unredaction=False,     # allow fuzzy matches on restore\n    traverse_containers=True,    # enable container traversal\n    on_detection=lambda es: log.info(\"%d entities\", len(es)),\n    on_hallucination=lambda issues: [\n        # replace hallucinated emails; preserve others\n        HallucinationResponse.replace(\"[REDACTED]\") if \"EMAIL\" in i.token else HallucinationResponse.preserve()\n        for i in issues\n    ],\n)\n```\n\n### Common customizations\n\n```python\n# Custom detector\npii = PIIConfig(detector=MyCustomDetector())\n\n# Batch processing for consistency and speed\npii = PIIConfig(batch_detection=True)\n\n# Handle hallucinations\nfrom redactyl.pydantic_integration import HallucinationResponse\ndef handle_llm_mistakes(issues):\n    return [\n        HallucinationResponse.replace(\"[REDACTED]\") if \"EMAIL\" in i.token else HallucinationResponse.preserve()\n        for i in issues\n    ]\npii = PIIConfig(on_hallucination=handle_llm_mistakes)\n```\n\nField-level control with `pii`:\n\n```python\nclass User(BaseModel):\n    # force detection as a PERSON and parse components\n    name: Annotated[str, pii(PIIType.PERSON, parse_components=True)]\n    # mark as email explicitly\n    email: Annotated[str, pii(PIIType.EMAIL)]\n    # or disable detection for a field\n    notes: Annotated[str, pii(detect=False)]\n```\n\n## Development\n\n```bash\nuv python pin 3.12\nuv pip install -e .[dev]\nuv run python -m spacy download en_core_web_sm\n\nuv run ruff check --fix && uv run ruff format\nuv run pyright src/\nuv run pytest -q\n```\n\n## License\n\nMIT \u2014 see LICENSE.\n\nSee CHANGELOG.md for release notes.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast, deterministic PII redaction using type-preserving tokens ([NAME_1], [EMAIL_2]) with perfect reversibility",
    "version": "0.4.0",
    "project_urls": {
        "Documentation": "https://github.com/screensailor/redactyl#readme",
        "Homepage": "https://github.com/screensailor/redactyl",
        "Issues": "https://github.com/screensailor/redactyl/issues",
        "Repository": "https://github.com/screensailor/redactyl"
    },
    "split_keywords": [
        "ai",
        " anonymization",
        " data-protection",
        " gdpr",
        " gliner",
        " llm",
        " pii",
        " presidio",
        " privacy",
        " redaction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ec7ea47069bd9e254ae24e28eb5c8ffe8d6294fd41433c412bb6097a6fff59c",
                "md5": "d1d891f6b3649356b73872501326a2a7",
                "sha256": "04a27001e67f15926f42236fb8e5a4b4d338a03e4385f983ec08f9749591ba58"
            },
            "downloads": -1,
            "filename": "redactyl-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1d891f6b3649356b73872501326a2a7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.12",
            "size": 47530,
            "upload_time": "2025-08-14T11:39:39",
            "upload_time_iso_8601": "2025-08-14T11:39:39.593744Z",
            "url": "https://files.pythonhosted.org/packages/0e/c7/ea47069bd9e254ae24e28eb5c8ffe8d6294fd41433c412bb6097a6fff59c/redactyl-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9d0c081b336367a7767ad5c228c385f91333a2372bae403eca0f3f10cbdc1958",
                "md5": "ab4d82e5ed3d792cf7386397d4a16aa3",
                "sha256": "fe3b0a56ea9394757e7941114c970f0ae01470dc249c21836f8c0c8f5867af71"
            },
            "downloads": -1,
            "filename": "redactyl-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ab4d82e5ed3d792cf7386397d4a16aa3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.12",
            "size": 272088,
            "upload_time": "2025-08-14T11:39:41",
            "upload_time_iso_8601": "2025-08-14T11:39:41.869729Z",
            "url": "https://files.pythonhosted.org/packages/9d/0c/081b336367a7767ad5c228c385f91333a2372bae403eca0f3f10cbdc1958/redactyl-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-14 11:39:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "screensailor",
    "github_project": "redactyl#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "redactyl"
}
        
Elapsed time: 1.00411s