datafog-instructor


Namedatafog-instructor JSON
Version 0.1.0b8 PyPI version JSON
download
home_pagehttps://datafog.ai
SummaryScan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.
upload_time2024-07-02 18:58:29
maintainerDataFog
docs_urlNone
authorSid Mohan
requires_python>=3.10
licenseMIT
keywords pii redaction nlp rag retrieval augmented generation entity recognition
VCS
bugtrack_url
requirements pydantic ollama ollama-instructor python-dotenv setuptools openai pytest black flake8 mypy functools32 types-requests aiohttp click sphinx sphinx-rtd-theme
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DataFog Instructor SDK

DataFog Instructor is a Python SDK for named entity recognition (NER) using Ollama as the LLM backend. It provides an easy-to-use interface for detecting and classifying entities in text.

## Installation

To install the DataFog Instructor SDK, you can use pip:

```
pip install datafog-instructor
```

For development purposes, including testing and documentation tools:

```
pip install datafog-instructor[dev,docs]
```

## Quick Start

Here's a simple example to get you started with DataFog Instructor:

```python
from datafog_instructor import DataFog

# Initialize DataFog with default settings
datafog = DataFog()

# Detect entities in text
text = "Cisco acquires Hess for $20 billion"
result = datafog.detect_entities(text)

# Print results
for entity in result.entities:
    print(f"Text: {entity.text}, Type: {entity.type.value}")
```

## Configuration

You can customize the DataFog instance using environment variables:

- `DATAFOG_LLM_BACKEND`: Currently only supports "ollama"
- `DATAFOG_LLM_ENDPOINT`: The host URL for the Ollama service (default: "http://localhost:11434")
- `DATAFOG_LLM_MODEL`: The model to use for entity detection (default: "phi3")

Example with custom settings:

```python
import os
os.environ['DATAFOG_LLM_ENDPOINT'] = 'http://custom-ollama-host:11434'
os.environ['DATAFOG_LLM_MODEL'] = 'custom-model'

from datafog_instructor import DataFog

datafog = DataFog()
```

## Features

### Detect Entities

Use the `detect_entities` method to identify and classify named entities in a given text:

```python
text = "Apple Inc. reported $100 billion in revenue for Q4 2023"
result = datafog.detect_entities(text)

for entity in result.entities:
    print(f"Text: {entity.text}, Type: {entity.type.value}")
```

### Manage Entity Types

You can add or remove entity types dynamically:

```python
# Add a new entity type
datafog.add_entity_type("CUSTOM", "Custom Entity")

# Remove an entity type
datafog.remove_entity_type("CUSTOM")

# Get all entity types
entity_types = datafog.get_entity_types()
print(entity_types)
```

## Default Entity Types

The SDK comes with an expanded list of predefined entity types, including:

- Organization Information: ORG, PERSON, TRANSACTION_TYPE, DEAL_STRUCTURE, FINANCIAL_INFO, PRODUCT, LOCATION, DATE, INDUSTRY, ROLE, REGULATORY, SENSITIVE_INFO, CONTACT, ID, STRATEGY, COMPANY, MONEY
- Personal Information: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, URL, AGE, NATIONALITY, JOB_TITLE, EDUCATION
- Location Information: ADDRESS, CITY, STATE, ZIP, COUNTRY, REGION

## Error Handling

The SDK includes error handling for various scenarios. If there's an issue with processing the response or an unexpected response format, it will raise a `ValueError` with details about the error.

## Development and Testing

For development purposes, you can install additional dependencies:

```
pip install datafog-instructor[dev]
```

This includes tools like pytest, black, flake8, and mypy for testing and code quality.

## Documentation

To build the documentation locally:

```
pip install datafog-instructor[docs]
cd docs
make html
```

The documentation will be available in the `docs/_build/html` directory.

## Contributing

Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License.

## Support

If you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.

## Links

- Homepage: https://datafog.ai
- Documentation: https://docs.datafog.ai
- Twitter: https://twitter.com/datafoginc
- GitHub: https://github.com/datafog/datafog-instructor

            

Raw data

            {
    "_id": null,
    "home_page": "https://datafog.ai",
    "name": "datafog-instructor",
    "maintainer": "DataFog",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "hi@datafog.ai",
    "keywords": "pii, redaction, nlp, rag, retrieval augmented generation, entity recognition",
    "author": "Sid Mohan",
    "author_email": "sid@datafog.ai",
    "download_url": "https://files.pythonhosted.org/packages/5a/83/8d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d/datafog_instructor-0.1.0b8.tar.gz",
    "platform": null,
    "description": "# DataFog Instructor SDK\n\nDataFog Instructor is a Python SDK for named entity recognition (NER) using Ollama as the LLM backend. It provides an easy-to-use interface for detecting and classifying entities in text.\n\n## Installation\n\nTo install the DataFog Instructor SDK, you can use pip:\n\n```\npip install datafog-instructor\n```\n\nFor development purposes, including testing and documentation tools:\n\n```\npip install datafog-instructor[dev,docs]\n```\n\n## Quick Start\n\nHere's a simple example to get you started with DataFog Instructor:\n\n```python\nfrom datafog_instructor import DataFog\n\n# Initialize DataFog with default settings\ndatafog = DataFog()\n\n# Detect entities in text\ntext = \"Cisco acquires Hess for $20 billion\"\nresult = datafog.detect_entities(text)\n\n# Print results\nfor entity in result.entities:\n    print(f\"Text: {entity.text}, Type: {entity.type.value}\")\n```\n\n## Configuration\n\nYou can customize the DataFog instance using environment variables:\n\n- `DATAFOG_LLM_BACKEND`: Currently only supports \"ollama\"\n- `DATAFOG_LLM_ENDPOINT`: The host URL for the Ollama service (default: \"http://localhost:11434\")\n- `DATAFOG_LLM_MODEL`: The model to use for entity detection (default: \"phi3\")\n\nExample with custom settings:\n\n```python\nimport os\nos.environ['DATAFOG_LLM_ENDPOINT'] = 'http://custom-ollama-host:11434'\nos.environ['DATAFOG_LLM_MODEL'] = 'custom-model'\n\nfrom datafog_instructor import DataFog\n\ndatafog = DataFog()\n```\n\n## Features\n\n### Detect Entities\n\nUse the `detect_entities` method to identify and classify named entities in a given text:\n\n```python\ntext = \"Apple Inc. reported $100 billion in revenue for Q4 2023\"\nresult = datafog.detect_entities(text)\n\nfor entity in result.entities:\n    print(f\"Text: {entity.text}, Type: {entity.type.value}\")\n```\n\n### Manage Entity Types\n\nYou can add or remove entity types dynamically:\n\n```python\n# Add a new entity type\ndatafog.add_entity_type(\"CUSTOM\", \"Custom Entity\")\n\n# Remove an entity type\ndatafog.remove_entity_type(\"CUSTOM\")\n\n# Get all entity types\nentity_types = datafog.get_entity_types()\nprint(entity_types)\n```\n\n## Default Entity Types\n\nThe SDK comes with an expanded list of predefined entity types, including:\n\n- Organization Information: ORG, PERSON, TRANSACTION_TYPE, DEAL_STRUCTURE, FINANCIAL_INFO, PRODUCT, LOCATION, DATE, INDUSTRY, ROLE, REGULATORY, SENSITIVE_INFO, CONTACT, ID, STRATEGY, COMPANY, MONEY\n- Personal Information: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, URL, AGE, NATIONALITY, JOB_TITLE, EDUCATION\n- Location Information: ADDRESS, CITY, STATE, ZIP, COUNTRY, REGION\n\n## Error Handling\n\nThe SDK includes error handling for various scenarios. If there's an issue with processing the response or an unexpected response format, it will raise a `ValueError` with details about the error.\n\n## Development and Testing\n\nFor development purposes, you can install additional dependencies:\n\n```\npip install datafog-instructor[dev]\n```\n\nThis includes tools like pytest, black, flake8, and mypy for testing and code quality.\n\n## Documentation\n\nTo build the documentation locally:\n\n```\npip install datafog-instructor[docs]\ncd docs\nmake html\n```\n\nThe documentation will be available in the `docs/_build/html` directory.\n\n## Contributing\n\nContributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Support\n\nIf you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.\n\n## Links\n\n- Homepage: https://datafog.ai\n- Documentation: https://docs.datafog.ai\n- Twitter: https://twitter.com/datafoginc\n- GitHub: https://github.com/datafog/datafog-instructor\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.",
    "version": "0.1.0b8",
    "project_urls": {
        "Discord": "https://discord.gg/bzDth394R4",
        "Documentation": "https://docs.datafog.ai",
        "GitHub": "https://github.com/datafog/datafog-instructor",
        "Homepage": "https://datafog.ai",
        "Twitter": "https://twitter.com/datafoginc"
    },
    "split_keywords": [
        "pii",
        " redaction",
        " nlp",
        " rag",
        " retrieval augmented generation",
        " entity recognition"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a838d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d",
                "md5": "d4e6761447f65acc3b09f4b8d1b3a3d8",
                "sha256": "e1e7107a9c01b9a49f97a77cd5334079e250d7d9b18438026d4f917e13cd8dd1"
            },
            "downloads": -1,
            "filename": "datafog_instructor-0.1.0b8.tar.gz",
            "has_sig": false,
            "md5_digest": "d4e6761447f65acc3b09f4b8d1b3a3d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 6500,
            "upload_time": "2024-07-02T18:58:29",
            "upload_time_iso_8601": "2024-07-02T18:58:29.811864Z",
            "url": "https://files.pythonhosted.org/packages/5a/83/8d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d/datafog_instructor-0.1.0b8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-02 18:58:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datafog",
    "github_project": "datafog-instructor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.7.1"
                ]
            ]
        },
        {
            "name": "ollama",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ],
                [
                    "<",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "ollama-instructor",
            "specs": [
                [
                    "==",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "58.1.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    "==",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    "==",
                    "24.1.1"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    "==",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    "==",
                    "1.8.0"
                ]
            ]
        },
        {
            "name": "functools32",
            "specs": [
                [
                    "==",
                    "3.2.3-2"
                ]
            ]
        },
        {
            "name": "types-requests",
            "specs": [
                [
                    "==",
                    "2.31.0.20240218"
                ]
            ]
        },
        {
            "name": "aiohttp",
            "specs": [
                [
                    "==",
                    "3.9.3"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "sphinx",
            "specs": [
                [
                    "==",
                    "7.2.6"
                ]
            ]
        },
        {
            "name": "sphinx-rtd-theme",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        }
    ],
    "lcname": "datafog-instructor"
}
        
Elapsed time: 9.08426s