datafog

Name	datafog JSON
Version	4.0.0 JSON
	download
home_page	https://datafog.ai
Summary	Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.
upload_time	2024-08-30 20:49:57
maintainer	DataFog
docs_url	None
author	Sid Mohan
requires_python	<3.13,>=3.10
license	MIT
keywords	pii redaction nlp rag retrieval augmented generation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <a href="https://www.datafog.ai"><img src="public/colorlogo.png" alt="DataFog logo"></a>
</p>

<p align="center">
    <b>Open-source PII Detection & Anonymization</b>. <br />
</p>

<p align="center">
  <a href="https://pypi.org/project/datafog/"><img src="https://img.shields.io/pypi/v/datafog.svg?style=flat-square" alt="PyPi Version"></a>
  <a href="https://pypi.org/project/datafog/"><img src="https://img.shields.io/pypi/pyversions/datafog.svg?style=flat-square" alt="PyPI pyversions"></a>
  <a href="https://github.com/datafog/datafog-python"><img src="https://img.shields.io/github/stars/datafog/datafog-python.svg?style=flat-square&logo=github&label=Stars&logoColor=white" alt="GitHub stars"></a>
  <a href="https://pypistats.org/packages/datafog"><img src="https://img.shields.io/pypi/dm/datafog.svg?style=flat-square" alt="PyPi downloads"></a>
  <a href="https://discord.gg/bzDth394R4"><img src="https://img.shields.io/discord/1173803135341449227?style=flat" alt="Discord"></a>
  <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square" alt="Code style: black"></a>
  <a href="https://codecov.io/gh/datafog/datafog-python"><img src="https://img.shields.io/codecov/c/github/datafog/datafog-python.svg?style=flat-square" alt="codecov"></a>
  <a href="https://github.com/datafog/datafog-python/issues"><img src="https://img.shields.io/github/issues/datafog/datafog-python.svg?style=flat-square" alt="GitHub Issues"></a>
</p>

## Installation

DataFog can be installed via pip:

```
pip install datafog
```

# CLI

## 📚 Quick Reference

| Command             | Description                          |
| ------------------- | ------------------------------------ |
| `scan-text`         | Analyze text for PII                 |
| `scan-image`        | Extract and analyze text from images |
| `redact-text`       | Redact PII in text                   |
| `replace-text`      | Replace PII with anonymized values   |
| `hash-text`         | Hash PII in text                     |
| `health`            | Check service status                 |
| `show-config`       | Display current settings             |
| `download-model`    | Get a specific spaCy model           |
| `list-spacy-models` | Show available models                |
| `list-entities`     | View supported PII entities          |

---

## 🔍 Detailed Usage

### Scanning Text

To scan and annotate text for PII entities:

```bash
datafog scan-text "Your text here"
```

**Example:**

```bash
datafog scan-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
```

### Scanning Images

To extract text from images and optionally perform PII annotation:

```bash
datafog scan-image "path/to/image.png" --operations extract
```

**Example:**

```bash
datafog scan-image "nokia-statement.png" --operations extract
```

To extract text and annotate PII:

```bash
datafog scan-image "nokia-statement.png" --operations scan
```

### Redacting Text

To redact PII in text:

```bash
datafog redact-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
```

which should output:

```bash
[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]
```

### Replacing Text

To replace detected PII:

```bash
datafog replace-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
```

which should return something like:

```bash
[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]
```

Note: a unique randomly generated identifier is created for each detected entity

### Hashing Text

You can select from SHA256, SHA3-256, and MD5 hashing algorithms to hash detected PII. Currently the hashed output does not match the length of the original entity, for privacy-preserving purposes. The default is SHA256.

```bash
datafog hash-text "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
```

generating an output which looks like this:

```bash
5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb
```

### Utility Commands

#### 🏥 Health Check

```bash
datafog health
```

#### ⚙️ Show Configuration

```bash
datafog show-config
```

#### 📥 Download Model

```bash
datafog download-model en_core_web_sm
```

#### 📂 Show Model Directory

```bash
datafog show-spacy-model-directory en_core_web_sm
```

#### 📋 List Models

```bash
datafog list-spacy-models
```

#### 🏷️ List Entities

```bash
datafog list-entities
```

---

## ⚠️ Important Notes

- For `scan-image` and `scan-text` commands, use `--operations` to specify different operations. Default is `scan`.
- Process multiple images or text strings in a single command by providing multiple arguments.
- Ensure proper permissions and configuration of the DataFog service before running commands.

---

💡 **Tip:** For more detailed information on each command, use the `--help` option, e.g., `datafog scan-text --help`.

# Python SDK

## Getting Started

To use DataFog, you'll need to create a DataFog client with the desired operations. Here's a basic setup:

```python
from datafog import DataFog

# For text annotation
client = DataFog(operations="scan")

# For OCR (Optical Character Recognition)
ocr_client = DataFog(operations="extract")
```

## Text PII Annotation

Here's an example of how to annotate PII in a text document:

```
import requests

# Fetch sample medical record
doc_url = "https://gist.githubusercontent.com/sidmohan0/b43b72693226422bac5f083c941ecfdb/raw/b819affb51796204d59987893f89dee18428ed5d/note1.txt"
response = requests.get(doc_url)
text_lines = [line for line in response.text.splitlines() if line.strip()]

# Run annotation
annotations = client.run_text_pipeline_sync(str_list=text_lines)
print(annotations)
```

## OCR PII Annotation

For OCR capabilities, you can use the following:

```
import asyncio
import nest_asyncio

nest_asyncio.apply()


async def run_ocr_pipeline_demo():
    image_url = "https://s3.amazonaws.com/thumbnails.venngage.com/template/dc377004-1c2d-49f2-8ddf-d63f11c8d9c2.png"
    results = await ocr_client.run_ocr_pipeline(image_urls=[image_url])
    print("OCR Pipeline Results:", results)


loop = asyncio.get_event_loop()
loop.run_until_complete(run_ocr_pipeline_demo())
```

Note: The DataFog library uses asynchronous programming for OCR, so make sure to use the `async`/`await` syntax when calling the appropriate methods.

## Text Anonymization

DataFog provides various anonymization techniques to protect sensitive information. Here are examples of how to use them:

### Redacting Text

To redact PII in text:

```python
from datafog import DataFog
from datafog.config import OperationType

client = DataFog(operations=[OperationType.SCAN, OperationType.REDACT])

text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
redacted_text = client.run_text_pipeline_sync([text])[0]
print(redacted_text)
```

Output:

```
[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]
```

### Replacing Text

To replace detected PII with unique identifiers:

```python
from datafog import DataFog
from datafog.config import OperationType

client = DataFog(operations=[OperationType.SCAN, OperationType.REPLACE])

text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
replaced_text = client.run_text_pipeline_sync([text])[0]
print(replaced_text)
```

Output:

```
[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]
```

### Hashing Text

To hash detected PII:

```python
from datafog import DataFog
from datafog.config import OperationType
from datafog.models.anonymizer import HashType

client = DataFog(operations=[OperationType.SCAN, OperationType.HASH], hash_type=HashType.SHA256)

text = "Tim Cook is the CEO of Apple and is based out of Cupertino, California"
hashed_text = client.run_text_pipeline_sync([text])[0]
print(hashed_text)
```

Output:

```
5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb
```

You can choose from SHA256 (default), SHA3-256, and MD5 hashing algorithms by specifying the `hash_type` parameter

## Examples

For more detailed examples, check out our Jupyter notebooks in the `examples/` directory:

- `text_annotation_example.ipynb`: Demonstrates text PII annotation
- `image_processing.ipynb`: Shows OCR capabilities and text extraction from images

These notebooks provide step-by-step guides on how to use DataFog for various tasks.

### Dev Notes

For local development:

1. Clone the repository.
2. Navigate to the project directory:
   ```
   cd datafog-python
   ```
3. Create a new virtual environment (using `.venv` is recommended as it is hardcoded in the justfile):
   ```
   python -m venv .venv
   ```
4. Activate the virtual environment:
   - On Windows:
     ```
     .venv\Scripts\activate
     ```
   - On macOS/Linux:
     ```
     source .venv/bin/activate
     ```
5. Install the package in editable mode:
   ```
   pip install -r requirements-dev.txt
   ```
6. Set up the project:
   ```
   just setup
   ```

Now, you can develop and run the project locally.

#### Important Actions:

- **Format the code**:
  ```
  just format
  ```
  This runs `isort` to sort imports.
- **Lint the code**:
  ```
  just lint
  ```
  This runs `flake8` to check for linting errors.
- **Generate coverage report**:
  ```
  just coverage-html
  ```
  This runs `pytest` and generates a coverage report in the `htmlcov/` directory.

We use [pre-commit](https://marketplace.visualstudio.com/items?itemName=elagil.pre-commit-helper) to run checks locally before committing changes. Once installed, you can run:

```
pre-commit run --all-files
```

#### Dependencies

For OCR, we use Tesseract, which is incorporated into the build step. You can find the relevant configurations under `.github/workflows/` in the following files:

- `dev-cicd.yml`
- `feature-cicd.yml`
- `main-cicd.yml`

### Testing

- Python 3.10

## License

This software is published under the [MIT
license](https://en.wikipedia.org/wiki/MIT_License).

Raw data

            {
    "_id": null,
    "home_page": "https://datafog.ai",
    "name": "datafog",
    "maintainer": "DataFog",
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": "hi@datafog.ai",
    "keywords": "pii, redaction, nlp, rag, retrieval augmented generation",
    "author": "Sid Mohan",
    "author_email": "sid@datafog.ai",
    "download_url": "https://files.pythonhosted.org/packages/f3/c9/61520ddb69b4a07178d152a908dde55fb430326fc0a9fb15410690fa00e9/datafog-4.0.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <a href=\"https://www.datafog.ai\"><img src=\"public/colorlogo.png\" alt=\"DataFog logo\"></a>\n</p>\n\n<p align=\"center\">\n    <b>Open-source PII Detection & Anonymization</b>. <br />\n</p>\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/datafog/\"><img src=\"https://img.shields.io/pypi/v/datafog.svg?style=flat-square\" alt=\"PyPi Version\"></a>\n  <a href=\"https://pypi.org/project/datafog/\"><img src=\"https://img.shields.io/pypi/pyversions/datafog.svg?style=flat-square\" alt=\"PyPI pyversions\"></a>\n  <a href=\"https://github.com/datafog/datafog-python\"><img src=\"https://img.shields.io/github/stars/datafog/datafog-python.svg?style=flat-square&logo=github&label=Stars&logoColor=white\" alt=\"GitHub stars\"></a>\n  <a href=\"https://pypistats.org/packages/datafog\"><img src=\"https://img.shields.io/pypi/dm/datafog.svg?style=flat-square\" alt=\"PyPi downloads\"></a>\n  <a href=\"https://discord.gg/bzDth394R4\"><img src=\"https://img.shields.io/discord/1173803135341449227?style=flat\" alt=\"Discord\"></a>\n  <a href=\"https://github.com/psf/black\"><img src=\"https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square\" alt=\"Code style: black\"></a>\n  <a href=\"https://codecov.io/gh/datafog/datafog-python\"><img src=\"https://img.shields.io/codecov/c/github/datafog/datafog-python.svg?style=flat-square\" alt=\"codecov\"></a>\n  <a href=\"https://github.com/datafog/datafog-python/issues\"><img src=\"https://img.shields.io/github/issues/datafog/datafog-python.svg?style=flat-square\" alt=\"GitHub Issues\"></a>\n</p>\n\n## Installation\n\nDataFog can be installed via pip:\n\n```\npip install datafog\n```\n\n# CLI\n\n## \ud83d\udcda Quick Reference\n\n| Command             | Description                          |\n| ------------------- | ------------------------------------ |\n| `scan-text`         | Analyze text for PII                 |\n| `scan-image`        | Extract and analyze text from images |\n| `redact-text`       | Redact PII in text                   |\n| `replace-text`      | Replace PII with anonymized values   |\n| `hash-text`         | Hash PII in text                     |\n| `health`            | Check service status                 |\n| `show-config`       | Display current settings             |\n| `download-model`    | Get a specific spaCy model           |\n| `list-spacy-models` | Show available models                |\n| `list-entities`     | View supported PII entities          |\n\n---\n\n## \ud83d\udd0d Detailed Usage\n\n### Scanning Text\n\nTo scan and annotate text for PII entities:\n\n```bash\ndatafog scan-text \"Your text here\"\n```\n\n**Example:**\n\n```bash\ndatafog scan-text \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\n```\n\n### Scanning Images\n\nTo extract text from images and optionally perform PII annotation:\n\n```bash\ndatafog scan-image \"path/to/image.png\" --operations extract\n```\n\n**Example:**\n\n```bash\ndatafog scan-image \"nokia-statement.png\" --operations extract\n```\n\nTo extract text and annotate PII:\n\n```bash\ndatafog scan-image \"nokia-statement.png\" --operations scan\n```\n\n### Redacting Text\n\nTo redact PII in text:\n\n```bash\ndatafog redact-text \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\n```\n\nwhich should output:\n\n```bash\n[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]\n```\n\n### Replacing Text\n\nTo replace detected PII:\n\n```bash\ndatafog replace-text \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\n```\n\nwhich should return something like:\n\n```bash\n[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]\n```\n\nNote: a unique randomly generated identifier is created for each detected entity\n\n### Hashing Text\n\nYou can select from SHA256, SHA3-256, and MD5 hashing algorithms to hash detected PII. Currently the hashed output does not match the length of the original entity, for privacy-preserving purposes. The default is SHA256.\n\n```bash\ndatafog hash-text \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\n```\n\ngenerating an output which looks like this:\n\n```bash\n5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb\n```\n\n### Utility Commands\n\n#### \ud83c\udfe5 Health Check\n\n```bash\ndatafog health\n```\n\n#### \u2699\ufe0f Show Configuration\n\n```bash\ndatafog show-config\n```\n\n#### \ud83d\udce5 Download Model\n\n```bash\ndatafog download-model en_core_web_sm\n```\n\n#### \ud83d\udcc2 Show Model Directory\n\n```bash\ndatafog show-spacy-model-directory en_core_web_sm\n```\n\n#### \ud83d\udccb List Models\n\n```bash\ndatafog list-spacy-models\n```\n\n#### \ud83c\udff7\ufe0f List Entities\n\n```bash\ndatafog list-entities\n```\n\n---\n\n## \u26a0\ufe0f Important Notes\n\n- For `scan-image` and `scan-text` commands, use `--operations` to specify different operations. Default is `scan`.\n- Process multiple images or text strings in a single command by providing multiple arguments.\n- Ensure proper permissions and configuration of the DataFog service before running commands.\n\n---\n\n\ud83d\udca1 **Tip:** For more detailed information on each command, use the `--help` option, e.g., `datafog scan-text --help`.\n\n# Python SDK\n\n## Getting Started\n\nTo use DataFog, you'll need to create a DataFog client with the desired operations. Here's a basic setup:\n\n```python\nfrom datafog import DataFog\n\n# For text annotation\nclient = DataFog(operations=\"scan\")\n\n# For OCR (Optical Character Recognition)\nocr_client = DataFog(operations=\"extract\")\n```\n\n## Text PII Annotation\n\nHere's an example of how to annotate PII in a text document:\n\n```\nimport requests\n\n# Fetch sample medical record\ndoc_url = \"https://gist.githubusercontent.com/sidmohan0/b43b72693226422bac5f083c941ecfdb/raw/b819affb51796204d59987893f89dee18428ed5d/note1.txt\"\nresponse = requests.get(doc_url)\ntext_lines = [line for line in response.text.splitlines() if line.strip()]\n\n# Run annotation\nannotations = client.run_text_pipeline_sync(str_list=text_lines)\nprint(annotations)\n```\n\n## OCR PII Annotation\n\nFor OCR capabilities, you can use the following:\n\n```\nimport asyncio\nimport nest_asyncio\n\nnest_asyncio.apply()\n\n\nasync def run_ocr_pipeline_demo():\n    image_url = \"https://s3.amazonaws.com/thumbnails.venngage.com/template/dc377004-1c2d-49f2-8ddf-d63f11c8d9c2.png\"\n    results = await ocr_client.run_ocr_pipeline(image_urls=[image_url])\n    print(\"OCR Pipeline Results:\", results)\n\n\nloop = asyncio.get_event_loop()\nloop.run_until_complete(run_ocr_pipeline_demo())\n```\n\nNote: The DataFog library uses asynchronous programming for OCR, so make sure to use the `async`/`await` syntax when calling the appropriate methods.\n\n## Text Anonymization\n\nDataFog provides various anonymization techniques to protect sensitive information. Here are examples of how to use them:\n\n### Redacting Text\n\nTo redact PII in text:\n\n```python\nfrom datafog import DataFog\nfrom datafog.config import OperationType\n\nclient = DataFog(operations=[OperationType.SCAN, OperationType.REDACT])\n\ntext = \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\nredacted_text = client.run_text_pipeline_sync([text])[0]\nprint(redacted_text)\n```\n\nOutput:\n\n```\n[REDACTED] is the CEO of [REDACTED] and is based out of [REDACTED], [REDACTED]\n```\n\n### Replacing Text\n\nTo replace detected PII with unique identifiers:\n\n```python\nfrom datafog import DataFog\nfrom datafog.config import OperationType\n\nclient = DataFog(operations=[OperationType.SCAN, OperationType.REPLACE])\n\ntext = \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\nreplaced_text = client.run_text_pipeline_sync([text])[0]\nprint(replaced_text)\n```\n\nOutput:\n\n```\n[PERSON_B86CACE6] is the CEO of [UNKNOWN_445944D7] and is based out of [UNKNOWN_32BA5DCA], [UNKNOWN_B7DF4969]\n```\n\n### Hashing Text\n\nTo hash detected PII:\n\n```python\nfrom datafog import DataFog\nfrom datafog.config import OperationType\nfrom datafog.models.anonymizer import HashType\n\nclient = DataFog(operations=[OperationType.SCAN, OperationType.HASH], hash_type=HashType.SHA256)\n\ntext = \"Tim Cook is the CEO of Apple and is based out of Cupertino, California\"\nhashed_text = client.run_text_pipeline_sync([text])[0]\nprint(hashed_text)\n```\n\nOutput:\n\n```\n5738a37f0af81594b8a8fd677e31b5e2cabd6d7791c89b9f0a1c233bb563ae39 is the CEO of f223faa96f22916294922b171a2696d868fd1f9129302eb41a45b2a2ea2ebbfd and is based out of ab5f41f04096cf7cd314357c4be26993eeebc0c094ca668506020017c35b7a9c, cad0535decc38b248b40e7aef9a1cfd91ce386fa5c46f05ea622649e7faf18fb\n```\n\nYou can choose from SHA256 (default), SHA3-256, and MD5 hashing algorithms by specifying the `hash_type` parameter\n\n## Examples\n\nFor more detailed examples, check out our Jupyter notebooks in the `examples/` directory:\n\n- `text_annotation_example.ipynb`: Demonstrates text PII annotation\n- `image_processing.ipynb`: Shows OCR capabilities and text extraction from images\n\nThese notebooks provide step-by-step guides on how to use DataFog for various tasks.\n\n### Dev Notes\n\nFor local development:\n\n1. Clone the repository.\n2. Navigate to the project directory:\n   ```\n   cd datafog-python\n   ```\n3. Create a new virtual environment (using `.venv` is recommended as it is hardcoded in the justfile):\n   ```\n   python -m venv .venv\n   ```\n4. Activate the virtual environment:\n   - On Windows:\n     ```\n     .venv\\Scripts\\activate\n     ```\n   - On macOS/Linux:\n     ```\n     source .venv/bin/activate\n     ```\n5. Install the package in editable mode:\n   ```\n   pip install -r requirements-dev.txt\n   ```\n6. Set up the project:\n   ```\n   just setup\n   ```\n\nNow, you can develop and run the project locally.\n\n#### Important Actions:\n\n- **Format the code**:\n  ```\n  just format\n  ```\n  This runs `isort` to sort imports.\n- **Lint the code**:\n  ```\n  just lint\n  ```\n  This runs `flake8` to check for linting errors.\n- **Generate coverage report**:\n  ```\n  just coverage-html\n  ```\n  This runs `pytest` and generates a coverage report in the `htmlcov/` directory.\n\nWe use [pre-commit](https://marketplace.visualstudio.com/items?itemName=elagil.pre-commit-helper) to run checks locally before committing changes. Once installed, you can run:\n\n```\npre-commit run --all-files\n```\n\n#### Dependencies\n\nFor OCR, we use Tesseract, which is incorporated into the build step. You can find the relevant configurations under `.github/workflows/` in the following files:\n\n- `dev-cicd.yml`\n- `feature-cicd.yml`\n- `main-cicd.yml`\n\n### Testing\n\n- Python 3.10\n\n## License\n\nThis software is published under the [MIT\nlicense](https://en.wikipedia.org/wiki/MIT_License).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.",
    "version": "4.0.0",
    "project_urls": {
        "Discord": "https://discord.gg/bzDth394R4",
        "Documentation": "https://docs.datafog.ai",
        "GitHub": "https://github.com/datafog/datafog-python",
        "Homepage": "https://datafog.ai",
        "Twitter": "https://twitter.com/datafoginc"
    },
    "split_keywords": [
        "pii",
        " redaction",
        " nlp",
        " rag",
        " retrieval augmented generation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a69ba3e46972bd161cf26bfb155d6bafa9c2505a2a68f160fe562bc9e8e3bdd4",
                "md5": "2505bdcda7ce6c75ac63252ff330eee6",
                "sha256": "1f7fc1e4bfaee389b38139b77d3eb788f7629dbe0b835441fa1f3bb9d8a16200"
            },
            "downloads": -1,
            "filename": "datafog-4.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2505bdcda7ce6c75ac63252ff330eee6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 34002,
            "upload_time": "2024-08-30T20:49:55",
            "upload_time_iso_8601": "2024-08-30T20:49:55.490488Z",
            "url": "https://files.pythonhosted.org/packages/a6/9b/a3e46972bd161cf26bfb155d6bafa9c2505a2a68f160fe562bc9e8e3bdd4/datafog-4.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f3c961520ddb69b4a07178d152a908dde55fb430326fc0a9fb15410690fa00e9",
                "md5": "bc7db61f7414de416c45c2af9bda7f24",
                "sha256": "086d8423b9ef4535dd22fdd6ddc7e181d31234ebe5091eb55e4813eef9029e06"
            },
            "downloads": -1,
            "filename": "datafog-4.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bc7db61f7414de416c45c2af9bda7f24",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 29536,
            "upload_time": "2024-08-30T20:49:57",
            "upload_time_iso_8601": "2024-08-30T20:49:57.171535Z",
            "url": "https://files.pythonhosted.org/packages/f3/c9/61520ddb69b4a07178d152a908dde55fb430326fc0a9fb15410690fa00e9/datafog-4.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-30 20:49:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datafog",
    "github_project": "datafog-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "datafog"
}

Sid Mohan